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_j. ' We consider minimum variance estimation within the sparse linear Gaussian model (SLGM). A sparse vector 

is to be estimated from a linearly transformed version embedded in Gaussian noise. Our analysis is based on 
the theory of reproducing kernel Hilbert spaces (RKHS). After a characterization of the RKHS associated with 
the SLGM, we derive novel lower bounds on the minimum variance achievable by estimators with a prescribed 

C/3 ' bias function. This includes the important case of unbiased estimation. The variance bounds are obtained via 

an orthogonal projection of the prescribed mean function onto a subspace of the RKHS associated with the 
SLGM. Furthermore, we specialize our bounds to compressed sensing measurement matrices and express them 

1^ ' in terms of the restricted isometry and coherence parameters. For the special case of the SLGM given by the 

00 ' sparse signal in noise model (SSNM), we derive closed-form expressions of the minimum achievable variance 

00 ' 

-y. ' (Barankin bound) and the corresponding locally minimum variance estimator We also analyze the effects of 

—il ' exact and approximate sparsity information and show that the minimum achievable variance for exact sparsity 

^^ ' is not a limiting case of that for approximate sparsity. Finally, we compare our bounds with the variance of 
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three well-known estimators, namely, the maximum-likelihood estimator, the hard-thresholding estimator, and 
compressive reconstruction using the orthogonal matching pursuit. 
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I. Introduction 

We study the problem of estimating the value g(x) of a known vector-valued function g(-) evaluated at 
the unknown parameter vector xGM^. It is known that x is S'-sparse, i.e., at most S of its entries are nonzero, 
where S G [A^] = {!,... , N} (typically S <^ N). While the sparsity degree S is known, the set of positions 
of the nonzero entries of x, i.e., the support supp(x) C [A^], is unknown. The estimation of g(x) is based on 
an observed random vector y = Hx + n G M^^, with a known system matrix H G M^f x Af ^j^^ independent and 
identically distributed (i.i.d.) Gaussian noise n ~ A/'(0, it^I) with known noise variance cr^ > 0. We assume 
that the minimum number of linearly dependent columns of H is larger than S. 

The data model described above will be termed the sparse linear Gaussian model (SLGM). The SLGM is 
relevant, e.g., to sparse channel estimation [Hi where the sparse parameter vector x represents the tap coefficients 
of a linear time-invariant channel and the system matrix H represents the training signal. More generally, the 
SLGM can be used for any type of sparse deconvolution |l2l. The special case of the SLGM obtained for H = I 
(so that M = N and y = x + n) will be referred to as the sparse signal in noise model (SSNM). The SSNM 
can be used, e.g., for sparse channel estimation yj employing an orthogonal training signal [3| and for image 
denoising employing an orthonormal wavelet basis [l4l. 

A fundamental question, to be considered in this work, is how to exploit the knowledge of the sparsity 
degree S. In contrast to compressed sensing (CS), where the sparsity is exploited for compression ||2]-||7], here 
we investigate how much the sparsity assumption helps us improve the accuracy of estimating g(x). Related 
questions have been previously addressed for the SLGM in [4| and |[8l- |[T3l . In [SI and 111, bounds on the 
minimax risk and approximate minimax estimators whose worst-case risk is close to these bounds have been 
derived for the SLGM. An asymptotic analysis of minimax estimation for the SSNM has been given in the 
seminal work ||4], lITOl . In the context of minimum variance estimation (MVE), which is relevant to our present 
work, lower bounds on the minimum achievable variance for the SLGM have been derived recently. In particular, 
the Cramer-Rao bound (CRB) for the SLGM has been derived and analyzed in 111 111 and |[T2l . Furthermore, in 
our previous work [13], we derived lower and upper bounds on the minimum achievable variance of unbiased 
estimators for the SSNM. 

The contributions of the present paper can be summarized as follows. First, we present novel CRB-type 
lower bounds on the variance of estimators for the SLGM. These bounds are derived by an application of 
the mathematical framework of reproducing kernel Hilbert spaces (RKHS) fT4ll - fT6ll . Since they hold for any 
estimator with a prescribed mean function, they are also lower bounds on the minimum achievable variance 
(also known as Barankin bound) for the SLGM. The bounds are tighter than those presented in ifTTI . lITll . 
and they have an appealing form in that they are scaled versions of the conventional CRB obtained for the 
nonsparse case IfTTI . |[T8l. We note that our RKHS approach is quite different from the technique used in [13 j. 
Also, a shortcoming of the lower bounds presented in Hill , and |[T3l is the fact that they exhibit a discontinuity 



when passing from the case ||x||q = S (i.e., x has exactly S nonzero values) to the case ||x||q < S (i.e., x 
has less than S nonzero values). For unbiased estimation, we derive a lower bound that is tighter than the 
bounds in |[Tll - |[T3l and, moreover, a continuous function of x. In particular, this bound exhibits a smooth 
transition between the two regimes given by ||x||q = S* and ||x||q < S. Based on the fact that the linear CS 
recovery problem is an instance of the SLGM, we specialize our lower bounds to system matrices that are CS 
measurement matrices, and we express them in terms of the restricted isometry and coherence parameters of 
these matrices. 

Furthermore, for the SSNM, we derive expressions of the minimum achievable variance at a given parameter 
vector X = xq and of the locally minimum variance (LMV) estimator, i.e., the estimator achieving the minimum 
variance at xq. Simplified expressions of the minimum achievable variance and the LMV estimator are obtained 
for a certain subclass of "diagonal" bias functions (which includes the unbiased case). 

Finally, we consider the SLGM with an approximate sparsity constraint and show that the minimum 
achievable variance under an exact sparsity constraint is not a limiting case of the minimum achievable variance 
under an approximate sparsity constraint. 

A central aspect of this paper is the application of the mathematical framework of RKHS 1 14] to the SLGM. 
The RKHS framework has been previously applied to classical estimation in the seminal work reported in lITSi 
and fTF|, and our present treatment is substantially based on that work. However, to the best of our knowledge, 
the RKHS framework has not been appUed to the SLGM or, more generally, to the estimation of (functions 
of) sparse vectors. The sparse case is specific in that we are considering functions whose domain is the set of 
S'-sparse vectors. For S < N, the interior of this set is empty, and thus there do not exist derivatives in every 
possible direction. This lack of a differentiable structure makes the characterization of the RKHS a somewhat 
delicate matter. 

The remainder of this paper is organized as follows. We begin in Section [II] with formal statements of the 
SLGM and SSNM and continue in Section |lll] with a review of basic elements of MVE. In Section |IVl we 
review some fundamentals of RKHSs and the application of RKHSs to MVE. In Section |Vl we characterize 
and discuss the RKHS associated with the SLGM. For the SLGM, we then use the RKHS framework to 
present formal characterizations of the class of bias functions allowing for finite-variance estimators, of the 
minimum achievable variance (Barankin bound), and of the LMV estimator. We also present a result on the 
shape of the Barankin bound. In Section |VlJ we reinterpret the sparse CRB of ifTTI from the RKHS perspective, 
and we present two novel lower variance bounds for the SLGM. In Section IVIII we specialize the bounds 
of Section |Vl] to system matrices that are CS measurement matrices. The important special case given by 
the SSNM is discussed in Section IVIIIi where we derive closed-form expressions of the minimum achievable 
variance (Barankin bound) and of the corresponding LMV estimator. A discussion of the effects of exact and 
approximate sparsity information from the MVE perspective is presented in Section |IXl Finally, in Section 
1X1 we present numerical results comparing our theoretical bounds with the actual variance of some popular 



estimation schemes. 

Notation and basic definitions. The sets of real, nonnegative real, natural, and nonnegative integer numbers 
are denoted by M, M+, N = {1,2,...}, and Z+ = {0,1,...}, respectively. For L € N, we define [L] = 
{1, . . . , L}. The space of all discrete-argument functions /[•] : T— )• M (with T^ Z) for which Xl^eT /^ W ^ °*^ 
is denoted by i'^{T), with associated norm ||/[-]||7- — \/J2ieT f'^l-^'^- "^^^ Kronecker delta 6k^i is 1 if /c = / 
and otherwise. Given an A^-tuple of nonnegative integers (a "multi-index") p = (pi • • • Pn)"^ G ^+ |[T9l . 
we define p! = riiefAriP^'' |p| — J2ie\N]Pi' ^'^'^ ^^ — IliefAflC^O''' (for ^ ^ ffi^). Given two multi-indices 
pi,P2 G Z;!^, the inequality pi < p2 is understood to hold elementwise, i.e., pi^i < p2^i for all I € [N]. 

Lowercase (uppercase) boldface letters denote column vectors (matrices). The superscript ^ stands for 
transposition. The kth unit vector is denoted by e^, and the identity matrix by I. For a rectangular matrix 
H G R^^^^, we denote by Ht its Moore-Penrose pseudoinverse ||20], by ker(H) = {x G ]R^|Hx = 0} its 
kernel (or null space), by span(H) = {y G M^^'^l 3x G M^: y = Hx} its column span, and by rank(H) its rank. 
For a square matrix H G M^^^, we denote by tr(H), det(H), and H^^ its trace, determinant, and inverse (if 
it exists), respectively. The fcth entry of a vector x is denoted by (x)j^, = x^, and the entry in the kth row and 
Ith column of a matrix H by (H)j;, ^ = Hk^i- The support (i.e., set of indices of all nonzero entries) and the 
number of nonzero entries of a vector x are denoted by supp(x) and ||x||q = |supp(x)|, respectively. Given 
an index set I C [N] , we denote by x-^ G ffi^ the vector obtained from x G ffi^ by zeroing all entries except 
those indexed by I, and by Hj G R^^^l^l the matrix formed by those columns of H G K^^^^ that are indexed 
by X. The p-norm of a vector x G M^ is defined as ||x||p = (XlfeGfM ^D • 

II. The Sparse Linear Gaussian Model 

We will first present a more detailed statement of the SLGM. Let x G ffi^ be an unknown parameter vector 

that is known to be 5-sparse in the sense that at most S of its entries are nonzero, i.e., ||x||q < S, with a 

known sparsity degree S G [N] (typically S-^N). We will express this 5-sparsity in terms of a parameter set 

Xs, i.e., 

xGXs, with Xs = {x'g M^I IIx'IIo < 5"} C M^. (1) 

In the limiting case where S is equal to the dimension of x, i.e., 5" = A^, we have Xs = M^. Note that the 
support supp(x) C [N] is unknown. We observe a linearly transformed and noisy version of x, 

y = Hx + nGM*^ (2) 

where Hg M*^^^ is a known matrix and nG ffi^^ is i.i.d. Gaussian noise, i.e., n ~ A/'(0, a^I), with a known 
noise variance o"^ > 0. It follows that the probability density function (pdf) of the observation y for a specific 
value of X is given by 



/H(y;x) = 7^:^--:j^-^;j7^ exp(-;:^-^||y-Hx||2 ) . (3) 



We assume that 



spark(H) > S, (4) 

where spark(H) denotes the minimum number of Unearly dependent columns of H iflTl . |[22l . Note that we 
also allow M < N (this case is relevant to CS methods as discussed in Section IVIII ): however, condition dJ]) 
implies that M > S. Condition ^ is weaker than the standard condition spark(H) > 25 111]. Still, the 
standard condition is reasonable since otherwise one can find two different parameter vectors xi , X2 G Xs for 
which /H(y; xi) = /H(y; X2) for all y, which implies that one cannot distinguish between xi and X2 based on 
knowledge of y. Finally, we note that the assumption of i.i.d. noise in Q does not imply a loss of generality. 
Indeed, consider an SLGM y = Hx + n where n is not i.i.d. with some positive definite (hence, nonsingular) 
covariance matrix C. Then, the "whitened observation" y = C^^/^y |23], where C^^/^ is the inverse of the 
matrix square root C^'^ |[24l . can be written as y = Hx + fi, with H = C^^' ^H and fi = C^^'^n. It can be 
verified that H also satisfies ^ and fi is i.i.d. with variance o"^ = 1, i.e., fi ~ J\f{0,I). 

The task considered in this paper is estimation of the function value g(x) from the observation y = Hx+n, 
where the parameter function g(-) : Xs -^ I^^ is a known deterministic function. The estimate g = g(y) G K^ 
is derived from y via a deterministic estimator g(-) : M*^— )• R^. We allow g G M^ without constraining g to 
be in g,{Xs) — {g(x)|xG A'5}, even though it is known that xG ^^5. The reason for not enforcing the sparsity 
constraint g G ^{Xs) is twofold: first, it would complicate the analysis; second, it would typically result in a 
worse achievable estimator performance (in terms of mean squared error) since it restricts the class of allowed 
estimators. In particular, it has been shown that a sparsity constraint can increase the worst-case risk of the 
resulting estimators significantly [25 1. 

Estimation of the parameter vector x itself is a special case obtained by choosing the parameter function 
as the identity mapping, i.e., g(x) = x, which implies P = N. Again, we allow x G M^ and do not constrain 
X to be in Xs- 

In what follows, it will be convenient to denote the SLGM-based estimation problem by the triple 

^SLGM = ('^5,/H(y;x),g(-)), 

where /H(y;x) is given by (O and will be referred to as the statistical model. A related estimation problem 
is based on the linear Gaussian model (LGM) jlTll . ll26l - |[28ll . for which x G ffi^ rather than x G Xs', this 
problem will be denoted by 

fLGM = (M^,/H(y;x),g(-)). 

The SLGM shares with the LGM the observation model (|2]l and the statistical model (O; it is obtained from the 

LGM by restricting the parameter set M^ to the set of S'-sparse vectors, Xs- For S = N, the SLGM reduces 

to the LGM. Another important special case of the SLGM is given by the SSNM, for which H = I, M = N, 

and 

y = x + n, 



where xG A's' and n r^ AA(0, a^I) with known variance (T^>0. The SSNM-based estimation problem will be 
denoted as 

^ssNM = ('^5,/i(y;x),g(-)). 

III. Basic Elements of Minimum Variance Estimation 

Let us consider a general estimation problem £ = (A', /(y;x),g(-)) based on an arbitrary parameter set 
X CI M^ and an arbitrary statistical model /(y;x). The general goal in the design of an estimator g(-) is that 
g(y) should be close to the true value g(x). A frequently used criterion for assessing the quality of an estimator 
g(y) is the mean squared error (MSE) defined as 

e ^ E.{||g(y)-g(x)||^} = / ||g(y)-g(x)||2/(y;x)dy. 

Here, Ex{-} denotes the expectation operation with respect to the pdf /(y;x); the subscript in Ex indicates 
the dependence on the parameter vector x parametrizing /(y;x). We will write e(g(-);x) to indicate the 
dependence of the MSE on the estimator g(-) and the parameter vector x. In general, there does not exist 
an estimator g(-) that minimizes the MSE simultaneously for all x G Af BOH . This follows from the fact that 
minimizing the MSE at a given parameter vector xq always yields zero MSE; this is achieved by the trivial 
estimator g(y) = g(xo), which ignores the observation y. 

A popular rationale for the design of good estimators is MVE. The MSE can be decomposed as 

e(g(-);x) = ||b(g(.);x)||2 + ^(g(.);x), (5) 

with the bias b(g(-);x) = Ex{g(y)} - g(x) and the variance w(g(-);x) = Ex{||g(y) - Ex{g(y)}||2}- In 
MVE, one fixes the bias on the entire parameter set X, i.e., one requires that 

b(g(-); x) = c(x) , for all x e Af , (6) 

with a prescribed bias function c(-) : X — )• M^, and attempts to minimize the variance f(g(-);x) among all 
estimators with the given bias function c(-). Fixing the bias function is equivalent to fixing the estimator's 
mean function, i.e., Ex{g(y)} = 7(x) for all x G A', with the prescribed mean function 7(x) = c(x) + g(x). 
Unbiased estimation is obtained as a special case for c(x) = or equivalently 7(x) = g(x). Fixing the bias can 
be viewed as a kind of "regularization" of the set of considered estimators ifTSl . Il30l . since it excludes useless 
estimators such as g(y) = g(xo). Another justification for considering a fixed bias function is that under mild 
conditions, for a large number of i.i.d. observations {yiJiefLl' ^^^ ^^^^ '■^'^"^ dominates in the decomposition 
^. Thus, in order to achieve a small MSE in that case, an estimator has to be at least asymptotically unbiased, 
i.e., one has to require that, for a large number of observations, b(g(-);x) ss for all x G Af. 

'This introductory section closely parallels II29I Section II]. We include it nevertheless because it constitutes an important basis for 
our subsequent discussion. 



For an estimation problem £ = (^X, f{y; x), g(-)) , a fixed parameter vector xq G X, and a prescribed bias 
function c(-) : A'—)- M^, we define the set of allowed estimators by 

^(c(.), xo) ^ {g(-) I «(g(-); xo) < oo , b(g(-); x) = c(x) Vx G A'} . 

We call a bias function c(-) valid for the estimation problem <S at xq € Af if the set ^(c(-),xo) is nonempty, 
which means that there is at least one estimator g(-) that has finite variance at xq and whose bias function 
equals c(-), i.e., b(g(-);x) = c(x) for all x G A'. For the SLGM, in particular, this definition trivially entails 
the following fact: If a bias function c(-) : Xs — )• M^ is valid for S = N, it is also valid for S < N. 

It follows from dSjl that, for a fixed bias function c(-), minimizing the MSE e(g(-);xo) is equivalent to 
minimizing the variance w(g(-);xo). Let us denote the minimum (strictly speaking, infimum) variance at xq 

for bias function c(-) by 

M(c(.),xo) ^ inf t;(g(-);xo). (7) 

g(-)e^(c(-),xo) 

If ^(c(-),xo) is empty, i.e., if c(-) is not valid, we set M(c(-),xo) = oo. Any estimator g('=()'''n)(-) g 
^(c(-),xo) that achieves the infimum in ([T]), i.e., for which 

^(g('=(-)'^«)(-);xo)=M(c(.),xo), (8) 

is called an LMV estimator at xq for bias function c(-) ifTSl . |[T6l . ifTS l. The corresponding minimum variance 
M(c(-),xo) is called the minimum achievable variance at xq for bias function c(-). The minimization problem 
defined by ^ is referred to as a minimum variance problem (MVP). From its definition in (jT]), it follows that 
M(c(-),xo) is a lower bound on the variance at xq of any estimator with bias function c(-), i.e., 

g(-) G ^(c(-),xo) ^ i^(g(-);xo) > M(c(-),xo) . 

This is sometimes referred to as the Barankin bound; it is the tightest possible lower bound on the variance at 
Xq of estimators with bias function c(-). 

If, for a prescribed bias function c(-), there exists an estimator that is the LMV estimator simultaneously at 
all Xq G X, then that estimator is termed the uniformly minimum variance (UMV) estimator for bias function 
c(-) HSl, mi, HH. For the SLGM, a UMV estimator does not exist in general |[l3l, ED. A noteworthy 
exception is the SLGM where H has full column rank, g(x) = x, 5 = A^, and c(-) = 0; here, it is well known 
ifTSl . ifTTl Thm. 4.1] that the least squares estimator, x = H^^y, is the UMV estimator. 

Finally, let gfc(-) = (g(-)), and c,fc(-) = (c(-)),. The variance of the vector estimator g(-) can be decomposed 

as 

T;(g(-);x) = J]«(5fc(-);x), (9) 

ke[P] 

where f(^fc(-);x) = Ex{ \gk{y) — Ex{5fc(y)}] } is the variance of the kt\\ estimator component gk{-)- Further- 
more, g(-) G ^(c(-),xo) if and only if gk{-) G ^(cjfc(-),xo) for all k G [P]. This shows that the MVP ^ can 



be reduced to P separate scalar MVPs 

M(cfe(-),xo) ^ inf t;(5fe(-);xo), k G [P] , 

9fc(-)G^{cfe(-),Xo) 

each requiring the optimization of a single scalar component gk{-) of g(-). Therefore, without loss of generality, 
we will hereafter assume that the parameter function g(x) is scalar-valued, i.e., P = l and g(x) =5f(x). 

IV. RKHS Fundamentals 

As mentioned in Section IH the existing variance bounds for the SLGM are not maximally tight. Using the 
theory of RKHSs will allow us to derive variance bounds which are tighter than the existing bounds. For the 
SSNM (see Section IVIIID . the RKHS approach even yields a precise characterization of the minimum achievable 
variance (Barankin bound) and of the accompanying LMV estimator. In this section, we present a review (similar 
in part to ||29] Section III]) of some fundamentals of the theory of RKHSs and of the application of RKHSs to 
MVE. These fundamentals will provide a framework for our analysis of the SLGM in later sections. 

A. Basic Facts 

An RKHS is associated with a kernel function R{- ,■) : X x X ^ W, where X is an arbitrary set. The 
defining properties of a kernel function are (i) symmetry, i.e., i?(xi,X2) = i?(x2,xi) for all xi,X2 G X, and 
(ii) positive semidefiniteness in the sense that, for every finite set {xi, . . . ,X£i} C X, the matrix R € M^^^ 
with entries Rm,n = R{^m,^n) is positive semidefinite. A fundamental result |[T4l p. 344] states that for any 
such kernel function R, there exists an RKHS ^{R), which is a Hilbert space equipped with an inner product 
(•, •)^(m and satisfying the following two properties: 

• For any xG^, i?(-,x) G 'H{R) (here, i?(-,x) denotes the function /x(x') = i?(x', x) for fixed x G X). 

• For any function /(•) G 'H{R) and any xG A", 

(/(•), i?(-,x)>^^^)=/(x). (10) 

The "reproducing property" (ITO) defines the inner product (/i,/2)^(m for all /i(-),/2(-) G ^{R), because 
^ny /(•) G 'H{R) can be expanded into the set of functions {i?(-,x)}^g^. The induced norm is ||/||^(m = 

\J{fJ)H{R)- 

For later use, we mention the following result ||T4] p. 351]. Consider a kernel function R{- , •) : XxX — )■ M, 

its restriction R' {■,■): X'xX'^R to a given subdomain X'xX' with X'(1X, and the corresponding RKHSs 

'H{R) and 'H{R'). Then, a function /'(•) : A" — )• M belongs to ^{R!) if and only if there exists a function 

/(•) : Af — )• M belonging to %{R) whose restriction to X', denoted f{-)\y,, equals /'(•). Thus, %{R') equals 

the set of functions that is obtained by restricting each function /(•) G 1-L{R) to the subdomain X', i.e., 

n{R') = {f\.) = f{.)\^,\f{.)en{R)}. (11) 



Furthennore llT4] p. 351], the nomi of a function /'(•) G ^{{R') is equal to the minimum of the norms of all 
functions /(•) G TilR) whose restriction to X' equals f'{-), i.e., 

||/'(-)||^(^,) = min ||/(-)||«(K). (12) 

B. The RKHS Approach to MVE 

RKHS theory provides a powerful mathematical framework for MVE |15|. Given an arbitrary estimation 
problem £ = (^X, f{y;x.),g{-)^ and a parameter vector xq € ^ for which /(y;xo) 7^ 0, a kernel function 
Re,xo{' 1 ■) ^nd, in turn, an RKHS 'Hs^xo can be defined as follows. We first define the likelihood ratio 

which is considered as a random variable (since it is a function of the random vector y) that is parametrized 
by X G A'. Next, we define the Hilbert space /2£-,xo as the closure of the hnear spaiu of the set of random 
variables |/9xo(yix)} ^. The inner product in ££-,xq is defined by 

(Pxo(y,Xi),/3xo(y,X2)>j^y = Ex„{/9xo(y,Xi)/9x„(y,X2)} = Ex J ^^^i— ^y^ L 

(It can be shown that it is sufficient to define (•, •)jjy for the random variables |pxo(y)X;)} ^ |[T5ll .) From now 

on, we consider only estimation problems E = (Af, /(y;x),g(-)) such that (/9xo(y5Xi),pxo(yiX;2))j,y < 00 

for all xi,X2 G X, or, equivalently, 

c f /(y;xi)/(y;x2)\ 

^xo { To? ^ ^ < 00 , for all xi, X2 G Af . 

I /ny;xo) J 

Thus, (• , •)rv i^ ^^11 defined. We can interpret the inner product (• , •)rv • ^£,xn ^ ^^,xo — ?• K as a kernel function 
i?f,xo(-,-):'^x;f^M: 

D / X A / . X . x\ p f /(y;xi)/(y;x2) ] 

i?£:,Xo(xi,X2) = (pxo(y,Xi),/)xo(y,X2))j^y = Ex,, | plyy^ ) f • ^^^^ 

The RKHS associated with the estimation problem £ = (Af , /(y; x), (7(-)) and the parameter vector xq G Af is 
then defined to be the RKHS induced by the kernel function Rs^xoi'i •)• ^^ will denote this RKHS as 'H£^:x_„, 
i.e., 'Hf ,xo — T~(-{Re,xo)- As shown in |[T5l . the two Hilbert spaces Ce,x„ and ^£-,xn are isometric, and a specific 
congruence, i.e., isometric mapping J[-] : ^f,xo— ^ ^£,xo is given by 

J[^£:,xo(-,x)] =px^(.,x). 

A fundamental relation of the RKHS T-Lg^xa with MVE is estabhshed by the following central result: 

Theorem IV.l ( II15L II16II ). Consider an estimation problem £ = (Af, /(y;x), (?(•)), a fixed parameter vector 
xq G X, and a prescribed bias fimction c(-) : Af — t- M, corresponding to the prescribed mean function ^{■) = 
c(-) + g{-). Then, the following holds: 

^For a detailed discussion of the concepts of closure, inner product, orthonormal basis, and linear span in the context of abstract 
Hilbert spaces, see |15| and |32). 
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1) The bias function c(-) is valid for £ at xq if and only if ^{■) belongs to the RKHS "H^-^xo- 

2) If the bias function c(-) is valid for E at xq, the minimum achievable variance at xq (Barankin bound) 
is given by 

M(c(-),xo) = ||7(-)ll?,,,.,-7'(xo), (15) 

and the LMV estimator at xq is given by 

5W-).Xo)(.)=J[^(.)]. 

Based on Theorem IIV. 1 [ the following remarks can be made: 

• The RKHS T-Ls^xo can be interpreted as the set of the mean functions 7(x) = Ex{g{y)} of all estimators 
g{-) with a finite variance at xq, i.e., w(^(-);xo) < oo. 

• The MVP ^ can be reduced to the computation of the squared norm ||7(-)||^ and isometric image 
J[7(-)] of the prescribed mean function 7(-), viewed as an element of the RKHS 'Hs,^o- This theoretical 
result is especially helpful if a simple characterization of 'H£:,xo i^ available. A simple characterization in 
the sense of |[T6l is given by an orthonormal basis for ?^£:,xo such that the inner products of 7(-) with the 
basis functions can be computed easily. 

• If a simple characterization of "H^-^xo is not available, we can still use (IT5] ) to establish a large class of lower 
bounds on the minimum achievable variance M(c(-),xo). Indeed, let U C "H^-^xo b^ ^n arbitrary subspace 



of "Wf.xo and let Pul{-) denote the orthogonal projection of 7(-) onto U. We then have ||7(-)||-h 
l|f^"^(")ll?^£. E2 Chapter 4] and thus, from dBJ, 



£,xo 



> 



M(c(-),xo) > ||P6f7(-)llL.„-7'(xo). (16) 

Some well-known lower bounds on the estimator variance, such as the Cramer-Rao and Bhattacharya 
bounds, are obtained from (IT6] i by specific choices of the subspace U |[29l . 

C. The RKHS Associated with the LGM 

In our analysis of the SLGM, the RKHS associated with the LGM will play an important role. Consider 
X = M.^ and /(y;x) = /H(y;x) as defined in (O, where the system matrix H G M}^^^ is not required to 
satisfy condition (01). The likelihood ratio (IT3] ) for /(y;x) = /H(y;x) is obtained as 

PLGM.xo(y,x) = ^y^ = exp ("- -i^[2y^H(xo-x) + ||Hx||i - ||Hxo||i]) . (17) 

/H(y;xo) V 2cj^ J 

Furthermore, from ([T4l . the kernel associated with the LGM follows as 

iiLGM,Xo(-,-): ^^XK^^^; ^LGM,Xo(xi,X2) = exp T^ (x2 - Xo)'^H^H(xi - Xq)") . (18) 

Let D = rank(H). We will use the thin singular value decomposition (SVD) of H, i.e., H = USV^, 
where U G M^^^-^ with U^U = I, V G M^^-^ with V'^V = I, and S G M^''^ is a diagonal matrix with 
positive diagonal entries (S)j;. ^ > [12011 . The next theorem has been shown in IIBTI Sec. 5.2]. 
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Theorem IV.2. Let ?^lgm,xo denote the RKHS associated with the LGM-based estimation problem <?lgm = 
(M^, /H(y;x), (/(•)) and the parameter vector xq G M^, and let H = VS^^ G M^^^. Then, the following 
holds: 

1) Any function /(•) G 'Hlgm,xo ^■^ invariant to translations by vectors x' G ffi belonging to the null space 
of H, /.e., /(x) = /(x + x') for all x' G ker(H) and x G M^. 

2) The RKHS ^lgm,xo '-^ isometric to the RKHS H{Rc) whose kernel Rg{- , ■) : W^ yc MP ^ ^ is given by 

i?G(zi,Z2) = exp(zfz2), Zi,Z2 GM^. 

A congruence from ^{{Rc) to ^lgm.xo is constituted by the mapping Kg[-] : T-L^Rg) — )• 'Hlgm.xq given by 



/J-i 



2 1 „TttTtt„ \ „ ^ TuAf 



Kg[/(-)] =/(x) ^/(^^Htx^expl 

forall f{-)G'H{RG), (19) 



. , 21IHX0II2 2^ ^ Hxo I , xG 



fl!?i(i a congruence from T^lgm.xo ^o %{Rg) is constituted by the inverse mapping Kq [•] : T^lgm.xq ~^ 
'H{Rg) given by 

Kg' [/(•)] = /(z) = 7(^Hz) expf- ^llHxoll^ + ^z^Htxc") , z G M^, 

for all /(•) G ?^LGM,xo • (20) 

The congruence Kg reduces the characterization of the RKHS ^lgm,xo to that of the RKHS 'H{Rg)- A 
simple characterization (in the sense of an orthonormal basis) of the RKHS 'H{Rg) can be obtained by noting 
that the kernel -Rg(') ■) is infinitely often differentiable and applying the results for RKHSs with differentiable 
kernels presented in ll33l . This leads to the following theorem ||3T1 . |[33]| . 



Theorem IV.3. 



1) For any p G Ij^, the RKHS 'H{Rg) contains the function r^^'{-): M — t- M given by 



r(P)(z) ^ 



1 dPRGiz,Z2) 



^.P 



z.=o \/p! 



2) The inner product of an arbitrary function /(•) G 'H{Rg) with r^P^(-) is given by 

1 5P/(z) 



(/(•)>-^^^(-)>^(^,) 



/pi 5zP 



(21) 



z=0 



3) The set of functions {j^^^H')} p^o ^^ ^^ orthonormal basis for 'H{Rg). 



In particular, because of result 3, a function /(•) : M^ — ;■ M belongs to %{Rg) if and only if it can be 
written pointwise as 

/(z)= J]a[p]r(P)(z) = ^^zP, (22) 






with a unique coefficient sequence a[p] G ^^(Z^). The coefficient a[p] is given by (|2TI ). i.e., 
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(23) 

z=0 



Expression (|22] | implies that any /(z) G 'H{Rg) is infinitely often differentiable and, because of (l23T l. fully 
detemiined by its partial derivatives at z = 0, i.e., JJ-p | for p G Z;^. Furthermore, since according to (IT9^ 
any function /(•) G "Hlgm.x,, is the image of a function /(•) G 'H{Rg) under the congruence Kg[-], it follows that 
also any /(•) G ^lgm.x,, is infinitely often differentiable and fully determined by its partial derivatives at x = 0, 
i.e., Q^p' I for p G Z^. (The latter fact holds because the partial derivatives of /(•) uniquely determine 
the partial derivatives of /(•) = K^^ [/(•)] via (l20l ) and the generalized Leibniz rule for the differentiation of 
a product of functions.) This agrees with the well-known result [34, Lemma 2.8] that for a statistical model 
of the exponential family type, the mean function of any finite-variance estimator is analytic, and thus fully 
determined by its partial derivatives at zero. (To appreciate the connection with the mean function of finite- 
variance estimators, recall from the discussion following Theorem II V. 1 1 that the elements of "Hlgm.xq ^r^ the 
mean functions of all finite-variance estimators for the LGM, which is a special case of an exponential family.) 

V. RKHS-BASED Analysis of Minimum Variance Estimation for the SLGM 

In this section, we apply the RKHS framework to the SLGM-based estimation problem <Sslgm = 
{^s, fu{'y',^),g{-)) ■ Thus, the parameter set is the set of 5-sparse vectors, X = Xs ^ I^^ in ([D^ and the 
statistical model is given by /(y; x) = /H(y; x) in ([3]l. More specifically, we consider SLGM-based MVE at a 
given parameter vector xq G A'g, for a prescribed bias function c(-) : Xs — >• M. We recall that the set of allowed 
estimators, ^(c(-),xo), consists of all estimators g{-) with finite variance at xq, i.e., t)(^(-);xo) < oo, whose 
bias function equals c(-), i.e., b{g{-);-x.) = c(x) for all x G Xs- 

Our results can be summarized as follows. We characterize the RKHS associated with the SLGM and 
employ it to analyze SLGM-based MVE. Using this characterization together with Theorem II V. 1 1 we provide 
conditions on the prescribed bias function c(-) such that the minimum achievable variance is finite, i.e., we 
characterize the set of valid bias functions (cf. Section HIT]). Furthermore, we present expressions of the minimum 
achievable variance (Barankin bound) Mslgm(c(-)) xq) and of the associated LMV estimator ^('=()'^o)(.) for an 
arbitrary valid bias function c(-). Since these expressions are difficult to evaluate in general, we finally derive 
lower bounds on the minimum achievable variance. These lower bounds are also lower bounds on the variance 
of any estimator with the prescribed bias function. 

A. The RKHS Associated with the SLGM 

Let us consider the SLGM-based estimation problem <5slgm = {'^Si fniy'T^), di')) and the corresponding 
LGM-based estimation problem Slgm = (l^^i /H(y; x), (7(-)) with the same system matrix H G M*^^^ satisfying 
condition (01) and with the same noise variance o"^. For an 5-sparse parameter vector xq G Xs, let ^slgm.xo 
and "Hlgm.xo denote the RKHSs associated with the estimation problems iSslgm and Slgm, respectively. Using 
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(fT4l) and ^, the kernel underlying "Hslgm.xo is obtained as 



RsLGM,^o{-r)- XsxXs-^^; -Rslgm,xo(xi,X2) = exp^(x2-xo)^H-'H(xi-xo) . (24) 



1 



2 



Comparing with the kernel i?LGM,xo('i •) underlying T^lgm.xo' which was presented in (ITS] ), we conclude that 
^SLGM.xo ( • , •) is the restriction of i?LGM,xo {■ ,■) to the subdomain Xs x '^s Q K^x M^. 

The characterization of T^lgm.xo provided by Theorems IIV.2I and IIV.3I is also relevant to 'Hslgm,xo- This is 
due to the following application of the "RKHS restriction result" in Section HV-AI (see (ITTI) and (IT2l) ): 



Corollary V.l. The RKHS ^slgm.xq consists of the restrictions of all functions /(•) : M — ?• M contained in 

T~(-LGM,xa to the subdomain Xs ^ M , i.e., 

"HsLGM.Xo = {/(•) = fi')\xs I -^("^ ^ "^LGM.Xo} • 

Furthermore, the norm of a function /'(•) G "Hslgm.xq '■^ equal to the minimum of the norms of all functions 
/(•) G 'Hlgm.xq whose restriction to Xs equals /'(•), i.e., 

ll/'(-)ll«S.CM..o=„ -in ll/(-)ll«.OM.„- (25) 

/(.■jSrtLGM.XQ 

/{•)L =/'{■) 

An immediate consequence of Corollary IV. II is the obviously fact that the minimum achievable variance for 
the SLGM can never exceed that for the LGM (if the prescribed bias function for the SLGM is the restriction 
of the prescribed bias function for the LGM). Indeed, letting c(-) : M^— ;• M be the prescribed bias function for 
the LGM and 7(-) = c(-) + g{-) the corresponding mean function, and recalling that xqS^s, we have 

Mslgm(c(-)U,,xo) ^ l|7(-)LJlLoM.„-^'(^°) ^ II7(-)IILm.,-7'(xo) ^ Mlgm(c(-),xo) . 



Thus, in the precise sense of Corollary IV. II ^slgm.xq is the restriction of ^lgm.xq to the set Xs of S-sparse 
parameter vectors, and the characterization of "Hlgm.xo provided by Theorems IIV.2l and lIV.3l can also be used for 
a characterization of ^slgm.xq- In what follows, we will employ this principle for developing an RKHS-based 
analysis of MVE for the SLGM. Proofs of the presented results can be found in (BTI. As before, we will use 
the thin SVD of the system matrix H, i.e., H = UI]V^, as well as the shorthand notations H = \"S~^ and 
D = rank(H). 

B. The Class of Valid Bias Functions 

The class of valid bias functions for the SLGM-based estimation problem <fsLGM = {'^s, fHiy-,^), di')) at 
:ko£Xs is characterized by the following result lISTj Thm. 5.3.1]: 



'indeed, prescribing the bias for all x £ R'^ (as is done within the LGM), instead of prescribing it only for the sparse vectors x £ Xs 
(as is done within the SLGM) can only result in a higher (or equal) minimum achievable variance. 
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Theorem V.2. A bias function c(-) : Xs -^ M is valid for <Sslgm = {'^Sj fniy]^), 9{')) <^t xqGA's if and only 
if it can be expressed as 

c(x) = expf^llHxoll^-^x^H^Hxo] J] ^[iHtx] -5(x), xG^-s, (26) 

with some coefficient sequence a[p] G ^^(Z;^). 

Theorem IV. 2 1 implies that the mean function 7(-) = c(-) + (7(-) corresponding to a bias function c(-) that is 
vaUd for <Sslgm at xq G ^5 is of the form 

7(x) = expf^llHxoll^-^x^H^Hxo] J] ^ [ ifitx] , xG;fs, (27) 

with some coefficient sequence a[p] G (."^{Z,^). The function on the right-hand side in (1271 ) is analytic on the 
domain ^5 in the senseQ that it can be locally represented at any point x G Xs by a convergent power series. 
Thus, in particular, the mean function 7(x) = Ex{^(y)} of any finite-variance estimator ^(y) is necessarily 
an "analytic" function. Again, this agrees with the general result about the mean function of estimators for 
exponential families presented in ll34l Lemma 2.8]. (Note that the statistical model of the SLGM is a special 
case of an exponential family.) 

In the special case where g{-x.) = x^ for some /cG [A^], a sufficient condition on a bias function to be valid 
is stated as follows iBlJ Thm. 5.3.4]: 

Theorem V.3. The function 

c(x) =exp(xfHtx) J^^J^intx] -Xfc, xG^-^, (28) 

with an arbitrary xi G M and coefficients a[p] satisfying \a\p\\ < C'^' with an arbitrary constant C G M+, 
is a valid bias function for <Sslgm = {^s, fniy, x), (^(x) = Xk) at any xq G Xs- In particular, for H = I, the 
unbiased case (i.e., c(x) = 0) is obtained for xi = 0, a[efe] = a, and a[p] = 0/or all other p G Z;^. 

Note that the difference of the factors in (l28T l compared to the factors in (|26] | (i.e., ^^ instead of ^H2i) 
is in accordance with the different condition on the coefficient sequence a[p] (i.e., |a[p]| < C'pI instead of 
a[p] G e{Z^)). 

C. Minimum Achievable Variance (Barankin Bound) and LMV Estimator 

Let us consider the MVP (|7]) at a given parameter vector xq G Xs for an SLGM-based estimation problem 
"^SLGM — {Xs, fu{y','^),9{-)) and for a prescribed bias function c(-) : Xs — >■ M, which is known to be valid. 
Then, the minimum achievable variance (Barankin bound) at xq, denoted Mslgm(c(-)iXo) (cf. ©), and the 
corresponding LMV estimator ^('=()'^o)(-) (cf. i^) are characterized by the following theorem IIBTI Thm. 5.3.1]. 

"^Note that a function with domain Xs, with S < N, cannot be analytic in the conventional sense since the domain of an analytic 
function has to be open by definition 1191 Definition 2.2.1]. 
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Theorem V.4. Consider an SLGM-based estimation problem <?slgm = ('^S'l /H(y;x),5r(-)) and a valid pre- 
scribed bias function c(-) : Xs ~^ II^- Then: 

1) The minimum achievable variance at xq ^Xg is given by 

Mslgm(c(-),xo) = min ||a[-]||L2;o) -7^(xo), (29) 

where 7(-) = c(-) + g{-), ||a[-]||^2(^D) — X]pez-° ^^[p]' '^^^ ^(c) ^ -^^(Z+) denotes the set of coefficient 
sequences a\p] G ^^(Z^) f/iaf are consistent with (I26I ). 

2) The function g{-) : M*^ — )• M g/ve?i Z?)' 

5(y) = exp (- ^ llHxoll^) Y. 7^ Xp(y) , (30) 

^ ^ pez° "^P" 

wjY/j an arbitrary coefficient sequence a[-] G C{c) and 



Xp(yj 



A 



5P [pLGM,xo(y, ^Hz) exp (ixjH^HHz)] 



9zP 



z=0 

where /OLGM,xo(y5x) /i' g/ven ^j (fTTl ), /i' a« allowed estimator at xq for c(-), i.e., g{-) G ^(c(-),xo). 

3) The LMV estimator at xq, g^'^^'''^°'{-), is given by (1301 ) using the specific coefficient sequence ao[p] = 
argmin„[.]gC{c)l|a[-] 11^2(^0). 

The kernel -Rslgm,xo(") ") given by (l24l) is pointwise continuous with respect to the parameter xq, i.e., 
limxi^xo^SLGM,x^(xi,X2) = i?sLGM,xo (xi, X2) for all xo,xi,X2 G Xs. Therefore, applying lE] Thm. 4.3.6] 
or ||29] Thm. IV.6] to the SLGM yields the following result. 

Corollary V.5. Consider the SLGM with parameter function g{x) = Xk and a prescribed bias function c(-) : 
Xs — )• M that is valid for iSslgm = {^Xs, /nly; x), /^(x) = x^) at each parameter vector xq G Xs. Then if c(-) 
is continuous, the minimum achievable variance Mslgm(c(-))Xo) is a lower semi-continuous^ function of xq. 



From Corollary IV. 5 1 we can conclude that the sparse CRB derived in ifTTI is not tight, i.e., it is not equal to 
the minimum achievable variance Mslgm(c(-))Xo). Indeed, the sparse CRB is in general a strictly upper semi- 
continuous function of the parameter vector xq, whereas the minimum achievable variance Mslgm(c(')>xo) 
is lower semi-continuous according to Corollary IV.5I Since a function cannot be simultaneously strictly upper 
semi-continuous and lower semi-continuous, the sparse CRB cannot be equal to Mslgm(c(-))Xo) in general. 

VI. Lower Variance Bounds for the SLGM 

While Theorem IV.4I provides a mathematically complete characterization of the minimum achievable vari- 
ance and the LMV estimator, the corresponding expressions are somewhat difficult to evaluate in general. 
Therefore, we will next derive lower bounds on the minimum achievable variance Mslgm{c{-) , :s.q) for the 

^A definition of lower semi-continuity can be found in 1351 . 



16 

estimation problem <?slgm = ('^5, /H(y;x), (^(x) = Xfc) with some A; G [N] and for a prescribed bias function 
c(-). These bounds are easier to evaluate. As mentioned before, they are also lower bounds on the variance 
of any estimator having the prescribed bias function. Our assumption that g{x.) =Xk is no restriction because, 
according to lISTI Thm. 2.3.1], the MVP for a given parameter function g{x.) and prescribed bias function c(x) is 
equivalent to the MVP for parameter function g'i'x.) = xj^ and prescribed bias function c'(x) = c{-x)+g{-x) — xj^. 
In particularly if c'(x) is valid for the MVP with parameter function ^'(x) = Xk, then c(x) = c'(x) — g{^) + x^ 
is valid for the MVP with parameter function (7(x). Therefore, any MVP can be reduced to an equivalent MVP 
with 5r(x) = Xfe and an appropriately modified prescribed bias function. 

We assume that the prescribed bias function c(-) is valid for <?slgm = ('^5; /H(y;x), (7(x) = Xk)- This 
validity assumption is no real restriction either, since our lower bounds are finite and therefore are lower 
bounds also if Mslgm(c(')>^o) = c«> which, by our definition in Section |IIIJ is the case if c(-) is not valid. 

The lower bounds to be presented are based on the generic lower bound (IT6l l. i.e., they are of the form 

Mslgm(c(-),xo) > l|Pw7(-)llLoM,.o-7'(xo), (31) 

for some subspaceZY C "Hslgm.xo- Here, the prescribed mean function 7(-) : A'g — )■ M, given by 7(x) = c(x)+Xfc, 
is an element of "Hslgm.xq since c(-) is assumed valid (recall Theorem lIV.il ). 

A. The Sparse CRB 

The first bound is an adaptation of the CRB lITTll . lITSl . ETl . ||29l to the sparse setting and has been 
previously derived in a slightly different form in [TTl. 



Theorem VI.l. Consider the estimation problem iSslgm = ('^5i /H(y;x),5f(x) = Xk) with a system matrix 
H G ]^AfxAf satisfying ^. Let xq G Xs- If the prescribed bias function c(-) : Xs — >• K is such that the partial 
derivatives q^' | ^_^ exist for all I G [N], then 



Ci IX=Xo 



Mslgm(c(-),xo) > <; "°- (32) 

Here, in the case ||xo||g < 5, b G ffi is given by hi = 6k,i + q^ \ _ , I € [N], and in the case ||xo||o = "S", 
bxo £ ^^ '^^d. Hxo G M^"^^"^ consist of those entries of b and columns of H, respectively that are indexed by 

supp(xo) = {fci, . . . , ks}, i.e., {h^,)^ = h^^ and (HxJ^^^ = (H)^ ;.^, i G [S]. 

'indeed, if c'(x) is valid at xo for the MVP with parameter function x^, there exists a finite-variance estimator g{-) with mean 
function Ex{g(y)} = c'(x) + Xk- For the MVP with parameter function g{-), that estimator g[-) has the bias function 

Kai-),^) = Ex{5(y)}-5(x) = c'(x) + a;fc -g(x) = c(x) . 

Thus, there exists a finite-variance estimator with bias function c(x) — c'(x) — g{x) + x^, which implies that the bias function c(-) is 
valid for the MVP with parameter function g{-). 
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A proof of this theorem is given in IBIl Thm. 5.4.1]. There, it is shown that the bound (1321 ) for ||xo||o < S 
is obtained from the generic bound (BTT i using the subspace U = span[uo{-),{ui{-)}i^r^i], where 

5-RsLGM,Xo("iX2) 



Uo{-) = ^SLGM,Xo(-,Xo) , Ul{-) 



X2 =Xo 



with -RsLGM,xo(") ■) given by (l24l) . and the bound (l32l ) for ||xo||q = 5" is obtained from (OTT i using the subspace 
Z// = span{Mo(")> {^K')}iesupp(x )}• "^^^^ establishes a new, RKHS-based interpretation of the bound in ifTTll 
in terms of the projection of the prescribed mean function 7(x) = c(x) + Xk onto an RKHS-related subspace 
U. We note that the bound in ifTTI was formulated as a bound on the variance t'(x(-);xo) of a vector-valued 
estimator x(-) of x (and not only of the kth entry x^). Consistent with (|9]), that bound can be reobtained by 
summing our bound in (l32l ) (with c(-) = Cfc(-)) over all k G [A^]. Thus, the two bounds are equivalent. 

An important aspect of Theorem IVI. 1 1 is that the lower variance bound in (l32l ) is not a continuous function 
of xq on Xs in general. Indeed, for the case H = I and c(-) = 0, which has been considered in [|13ll . it can be 
verified that the bound is a strictly upper semi-continuous function of xq: for example, for M = A^ = 2, H = I, 
c(-) = 0, 5 = 1, A; = 2, and xq = a- (1,0)^ with a G M+, the bound is equal to 1 for a = (case of ||xo||q < S) 
but equal to for all a > (case of ||xo||o = 5"). However, by Corollary IV. 5 1 the minimum achievable variance 
AfsLGM(c(-))Xo) is a lower semi-continuous function of xq. It thus follows that the bound in (l32l) cannot be 
tight, i.e., it cannot be equal to Mslgm(c(-)!Xo) for all xq G Xs, which means that we have a strict inequality 
in (l32l) at least for some xq G Xs- 

Let us finally consider the special case where M > N and H G M^^^^ has full rank, i.e., rank(H) = N. 
The least-squares (LS) estimator IITtI . ||27| of Xk is given by £LS,A,(y) = e^H^; it is unbiased and its variance 

is 

f(xLS,fc(-);xo) = a^el{ii^ii)-'ek. (33) 

On the other hand, for unbiased estimation, i.e., c(-) = 0, our lower bound for ||xo||o < S* in (l32l) becomes 
Mslgm{c{-) = 0,xo) > o-^b^(H^H)tb = cr^ef (H'^H)"^efc. Comparing with (l33]l, we conclude that our 
bound is tight and the minimum achievable variance is in fact 

Mslgm(c(-) = 0,xo) = a2e^(H^H)-iefc, 

which is achieved by the LS estimator. Thus, for M > A^ and rank(H) = N, the LS estimator is theQ LMV 
unbiased estimator for the SLGM at each parameter vector xq G Xs with ||xo||q < S. It is interesting to note 
that the LS estimator does not exploit the sparsity information expressed by the parameter set Xs, i.e., the 
knowledge that ||x||g < S, and that it has the constant variance (1331 ) for each xq G Xs (in fact, even for 
Xq G M^). We also note that the LS estimator is not an LMV unbiased estimator for the case ||xo||q = S; 
therefore, it is not a UMV unbiased estimator on Xs (i.e., an unbiased estimator with minimum variance at 
each Xq G Xs). In fact, as shown in |[T3ll . and |[3TI . there does not exist a UMV unbiased estimator for the 
SLGM in general. 

^If an LMV estimator exists, it is unique 1181 . 
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B. A Novel CRB-Type Lower Variance Bound 

A novel lower bound on AfsLGM(c(-)iXo) is stated in the following theorem ll36l . 

Theorem VI.2. Consider the estimation problem <fsLGM = ('^5; /H(y;x),5(x) = Xk) with a system matrix 

H € M.^^^^ satisfying ^. Let xq G Xs> ^nd consider an arbitrary index set K, = {/ci, . . . , /ci/ci} ^ [N\ consisting 
of no more than S indices, i.e., |/C| < S. If the prescribed bias function c(-) : Xg — >• M is such that the partial 
derivatives ^^^' exist for all kj G /C, ther ^ 

OXk^ lx=Xo •' J ' 1-1 

Mslgm(c(-),xo) > expf-^||(I-P)Hxo||^') [a2b^jH^H/c)"'bx„ + 7'(Xo)] - 7'(xo) • (34) 

Here, P = UjciH/c)^ £ M^^x^^, bx^G MI'^I is defined elementwise as (h^,)- = 6k,k^+ ^^1^=5^ for i G [|/C|], 
Xq G M is defined as the unique (due to ©J vector with supp(xo) C /C solving Hxq = PHxo, and 7(x) = 

C(x) + Xfc. 

According to llBTl Thm. 5.4.3], the bound in (|34] | follows from the generic bound (OTI) by using the subspace 

U = span{Mo(-), {ui{-)}i(zic}' where 

9i?SLGM,Xo ( ■ ) x;2 ) 



Uoi-) = -RsLGM,Xo(-,Xo), Ui{-) 



/G/C. 



We note that the bound presented in ll36l is obtained by maximizing (l34l ) with respect to the index set /C; this 
gives the tightest possible bound of the type (l34l ). 

For the special case given by the SSNM, i.e., H = I, and unbiased estimation, i.e., c(-) = 0, the bound (l34l ) 
is a continuous function of xq on Xs. This is an important difference from the bound given in Theorem IVI. 1 1 
and, also, from the bound to be given in Theorem lVIII.81 Furthermore, still for H = I and c(-) = 0, the bound 
(1341 ) can be shown |[36l . 1(311 p. 106] to be tighter (higher) than the bounds in Theorem IVI. 1 1 and Theorem 
IVIII.81 

The matrix P appearing in (|34] | is the orthogonal projection matrix [1201 on the subspace Tifc = span(H/(;) 
C M*^, i.e., the subspace spanned by those columns of H whose indices are in /C. Consequently, I — P is the 
orthogonal projection matrix on the orthogonal complement of T-Ljc, and the norm || (I — P)Hxo||2 thus represents 
the distance between the point Hxq and the subspace Tiic IB21. Therefore, the factor exp(— ^j- ||(I — P)Hxo||2) 
appearing in the bound (l34l i can be interpreted as a measure of the distance between Hxq and Tifc. In general, 
the bound (l34l ) is tighter (i.e., higher) if /C is chosen such that the distance ||(I— P)Hxo||2 is smaller 

A slight modification in the derivation of (l34l ) yields the following alternative bound: 

Mslgm(c(-),xo) > exp('-^||(I-P)Hxo||^')a2b^jH^H,c)"'b,o- (35) 

As shown in ||3T1 Thm. 5.4.4], this bound follows from the generic lower bound (OTT i by using the subspace 

U = span{uoi-),{ui{-)}i^j^}, with uo(-) = -Rslgm,xo(-,xo) and ui{-) = ^"'g'^xT) |x2=xo ^^ defined 

*Note that (H^Hk)"^ exists because of Q. 
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previously. Note that this subspace deviates from the subspace underlying the bound (l34l ) only by the use of 
uq{-) instead of no(-)- The difference of the bounds (|35] | and (l34l ) is 

A(35)-{B = 7'(xo) - expf-^||(I-P)Hxo||^')7'(Xo). (36) 

This depends on the choice of the index set /C (via P and xq). If, for some /C and c(-), 7^(xo) w 7^(xo), then 
^OD-O is approximately nonnegative since exp(— ^ ||(I — P)Hxo||2) < 1- Hence, in that case, the bound 
(|35] ) is tighter (higher) than the bound (|34] |. We note that one sufficient condition for 7^(xo) ~ 7^(xo) is that 
the columns of H/c are nearly orthonormal and c(-) = 0, i.e., unbiased estimation. 

The bounds (l34l ) and (1351 ) have an intuitively appealing interpretation in terms of a scaled CRB for an 
LGM. Indeed, the quantity u^bj^^ (H^Hyc) bx^ appearing in (|34] | and (|35] | can be interpreted as the CRB 
ifTTIl for the LGM with parameter dimension N = |/C|, parameter function (^(x) = Xk, and prescribed bias 
function c(-). For a discussion of the scaling factor exp(— ^ ||(I — P)Hxo||2), we will consider the following 
two complementary cases: 

1) For the case where either k G supp(xo) or ||xo||q < S (or both), the factor exp(— ^||(I — P)Hxo||2) 
can be made equal to 1 by choosing /C = supp(xo) U {k}. 

2) On the other hand, consider the complementary case where k ^ supp(xo) and ||xo||o = S. Choosing 
/C = Cu{k}, where C comprises the indices of the 5 — 1 largest (in magnitude) entries of xq, we obtain 
11(1 — P)Hxo||2 = ^0 ll(-'-~f*)Hejo||2, where ^o and in denote the value and index, respectively, of the 
smallest (in magnitude) nonzero entry of xq. Typicallyjfl || (I — P)Hej„ II2 > and therefore, as ^0 becomes 
larger (in magnitude), the bound (l35T l transitions from a "low signal-to-noise ratio (SNR)" regime, where 
exp(-^||(I-P)Hxo||2) ~ 1, to a "high-SNR" regime, where exp(-^||(I - P)Hxo||2) ^ 0. In the 
low-SNR regime, the bound (|35] ) is approximately equal to (T^bx„(H^H/c) bx^, i.e., to the CRB for the 
LGM with N= |/C|. In the high-SNR regime, the bound becomes approximately equal to 0; this suggests 
that the zero entries Xk with k ^ supp(x) can be estimated with small variance. Note that for increasing 
^0, the transition from the low-SNR regime to the high-SNR regime exhibits an exponential decay. 

VII. The SLGM View of Compressed Sensing 

The lower bounds of Section |Vl] are also relevant to the linear CS recovery problem, which can be viewed 
as an instance of the SLGM-based estimation problem. In this section, we express one of these lower bounds 
in terms of the restricted isometry constant of the system matrix (CS measurement matrix) H. 

'Note that, for the case k ^ supp(xo) and |[xo||q = S considered, jo ^ K. with |/C| < S. For a system matrix H satisfying Q, we 
then have ||(I — P)Hejo Hi > if and only if the submatrix H;cu{io} ^^^ ^"^'^ column rank. 
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A. CS Fundamentals 

The compressive measurement process within a CS problem is often modeled as (SI, Q, EU, |[37]| . ll38l 

y = Hx + n . (37) 

Here, y G M^^ denotes the compressive measurements; H G M^^^^, where M < N and typically M <^ N, 



denotes the CS measurement matrix; x G Xs C M^ is an unknown S'-sparse signal or parameter vector, with 

known sparsity degree S (typically S <^ N); and n represents additive measurement noise. We assume that 

n ~ A/'(0,(7^I) and that the columns {hj}jg[jv] of H are normalized, i.e., ||hj||2 = 1 for all j G [N]. The CS 

measurement model (|37] ) is then identical to the SLGM observation model ©. Any CS recovery methodl^j such 

as the Basis Pursuit (BP) 1371, IIH or the Orthogonal Matching Pursuit (OMP) EU, gOl, can be interpreted 

as an estimator x(y) that estimates the sparse vector x from the observation y. 

Due to the typically large dimension of the measurement matrix H, a complete characterization of the 

properties of H (e.g., via its SVD) is often infeasible. Useful incomplete characterizations are provided by 

the (mutual) coherence and the restricted isometry property Q, ETTl . ||37]| . |[38]| . The coherence of a matrix 

HGM^^^^is defined as 

/x(H)^max|h[h,|. 

Furthermore, a matrix H G M^><^ is said to satisfy the restricted isometry property (RIP) of order K if for 
every index set Z C [N\ of size \I\ = K there is a constant 5'^ G M_(_ such that 

(l-(5^)||z||2 < ||Hxz||2 < (l + 5^)||z||2, forallzGM^. (38) 

The smallest 5^ for which (1381 ) holds — hereafter denoted 5k — is called the RIP constant of H. Condition Q 
is necessary for a matrix H to have the RIP of order S with a RIP constant ^5 < lO It can be easily verified 
that 5k' > 5k for K' > K. The coherence /u(H) provides a coarser description of the matrix H than the RIP 
constant 5k but can be calculated more easily. The two parameters are related according to 5k < (-f^~l)^(H) 



B. A Lower Variance Bound 

We now specialize the bound (|35] ) on the minimum achievable variance for <?slgm to the CS scenario, i.e., 
to the SLGM with sparsity degree S and a system matrix H that is a CS measurement matrix (i.e., M < N) 
with known RIP constant (^5 < 1. Note that 5s < 1 implies that condition ^ is satisfied. The following result 
was presented in ||3T] Thm. 5.7.2]. 

'"a comprehensive overview is provided at 'http://dsp.rice.edu/cs' 

^'indeed, assume that spark(H) < S. This means that there exists an index set I C [A'^] consisting of S indices such that the 
columns of Hi are linearly dependent. This, in turn, implies that there is a nonzero coefficient vector z £ R such that Hiz — and 
consequently ||Hiz||2 ~ 0. Therefore, there cannot exist a constant 5^ < 1 satisfying J38b for all z £ R . 
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Theorem VII.l. Consider the SLGM-based estimation problem <?slgm = ('^Si /H(y;x),g(x) =Xk), where 
H G '^MxN yjifyi M < N satisfies the RIP of order S with RIP constant 6s < 1- Let xq G Xs, and consider 
an arbitrary index set /C C [N] consisting of no more than S indices, i.e., |/C| < S. If the first-order partial 
derivatives q^' _ of the prescribed bias function c(-) : Xs — )• M exist for all I G /C, then 

Mslgm(c(-),xo) > exp(-i±i^||xr^(-)\'=||^) a2bJ„(H^H^)-^b,, , (39) 

with bx„ G MI'^I fli defined in Theorem IVI.2I 

Using the inequality ^5 < (5— l)/i(H), we obtain from (|39] l the coherence-based bound 

?1# //\ ^ ^ ( l+('5'-l)^(H) II supp{xo)\/C||2\ 2iT/'ttTtt ^-lu 

Mslgm(c(-),xo) > expl -^ Fo hj'^^^A^K^K.) Wo- 

lf we want to compare the actual variance behavior of a given CS recovery scheme (or, estimator) Xfc(-) 
with the bound on the minimum achievable variance in (|39l ). then we have to ensure that the first-order partial 
derivatives of the estimator's bias function Ex{£fc(y)} — x^ exist. The following lemma states that this is indeed 
the case under mild conditions. Moreover, the lemma gives an explicit expression of these partial derivatives. 

Lemma VII.2 ( 11341 Cor. 2.6]). Consider the SLGM-based estimation problem <Sslgm = {^Xs, fiiiy'-, x), (7(x) = 
Xk) and an estimator Xfc(-) : M^^ — )■ M. If the mean function 7(x) = Ex{a;jt(y)} exists for all x G Xs, then 
also the partial derivatives q^ , I G [N] exist for all x G Xs and are given by 

^^ = 5u,i + ^Ex{xfc(y)(y-Hx)^He;} . (40) 

C. The Case ^5 ?a 

For CS applications, measurement matrices H with RIP constant close to zero, i.e., ^5 ^ 0, are generally 
preferable Q, |l38l, ||4l]-||43l. For 5s = 0, the bound in ^ becomes 

Mslgm(c(-),xo) > exp('-i^||xr'^(-«)\'=||^] <T2b^„(H^H^)-^bx„ . (41) 



This is equal to the bound dST]) for the SSNM (i.e., H = I) except that the factor bj (H^H/c) ^b 



m 



(HTb is replaced by ||bxo||2 i'^ dSVl ). For a "good" CS measurement matrix, i.e., with 5s ~ 0, we have 



bxp(H^H/(;) bx,, ~ ||bxoll2 ^^^ ^"^y iridex set K. C [N\ of size |/C| < S. Thus, the bound in (|4TI ) is very close 
to (1571 ). This means that, conversely, in terms of a lower bound on the achievable estimation accuracy, relative 
to the SSNM (case H = I), no loss of information is incurred by multiplying x by the CS measurement matrix 
H G M*^^^ and thereby reducing the signal dimension from N to M, where typically M <^ N. This agrees 
with the fact that if 5s ~ 0, one can recover — e.g., by using the BP — the sparse parameter vector x G Xs from 
the compressed observation y = Hx + n up to an error that is typically very small (and whose norm is almost 
independent of H and solely determined by the measurement noise n ||71, P4l ). 
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VIII. RKHS-BASED Analysis of Minimum Variance Estimation for the SSNM 

Next, we specialize our RKHS-based MVE analysis to the SSNM, i.e., to the special case given by H = I 
(which implies M = N and y = x + n). For the SSNM-based estimation problem <Sssnm = ('^5, /i(y ; x) , g(-)) 
with k G [N], we will analyze the minimum achievable variance Mssnm(c(-))Xo) and the corresponding LMV 
estimator. We note that the SLGM with a system matrix H € M^^^^ having orthonormal columns, i.e., satisfying 
H'^H = I, is equivalent to the SSNM Iil3ll . 

Specializing the kernel i?sLGM,xo('i •) (see (l24l )) to the system matrix H = I, we obtain 

^ssNM,xo(xi,X2) = expf — (x2-Xo)'^(xi-Xo)j , Xq, Xi, X2 G A'5 . (42) 



The corresponding RKHS, ^(-Rssnm,xo)> will be briefly denoted by ?^ssnm. 



xo- 



A. Valid Bias Functions, Minimum Achievable Variance, and LMV Estimator 

Since the SSNM is a special case of the SLGM, we can characterize the class of valid bias functions, the 
minimum achievable variance (Barankin bound), and the corresponding LMV estimator by Theorems IV. 2 1 and 
IV.4I specialized to H = I, as stated in the following corollary. 

Corollary VIII.l. Consider the SSNM-based estimation problem <?ssnm = i^s-, fi(y"i x), 5'(-)) with k G [N]. 

1) A bias function c(-) : Xs — )• M is valid for <?ssnm at xqG^s if and only if it can be expressed as 

c(x) = expf ^||xo||2- ^x^xoj ^ ^=f -xj - ^(x) , xeA'5, (43) 

with some coefficient sequence a[p] G £^(Z^). 

2) Let c(-) : Xs -^M be a valid prescribed bias function. Then: 

a) The minimum achievable variance at xq G ^^5, Mssnm(c(-)iXo), is given by ( I29I ). in which C{c) C 
£^(Z^) denotes the set of coefficient sequences a[p\ G £^(Z^) that are consistent with (1431 ). 

b) The function g{-) : M^^— ;■ M given by 

g{y) = exp(-^||xo||^] J] 7=Xp(y), (44) 

with an arbitrary coefficient sequence a[-] G C{c) and 

, . A ^''[PLGM,xo(y,f^x)exp(ix^x)] 

Xp(y) = 



x=0 



9xP 
is an allowed estimator at xq for c(-), i.e., g{-) G A{c{-),xo). 

c) The LMV estimator at xq, g^'^^'''^°'{-), is given by (1441) using the specific coefficient sequence ao[p] 
argmin„[.]gC(c)l|a[-]||^2(^o). 
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However, a more convenient characterization can be obtained by exploiting the specific structure of "Hssnm.xo 
that is induced by the choice H = I. We omit the technical details, which can be found in | 31 , Sec. 5.5], and 
just present the main results regarding MVE IIBTI Thm. 5.5.2]. 



Theorem VIII.2. Consider the SSNM-based estimation problem <Sssnm = {^s^ /i(y;x),g(-)) with k G [A^]. 

1) A prescribed bias function c(-) : Xs ~^ I^ i^ valid for <?ssnm cit xq G Xs if and only if the associated 
prescribed mean function 7(-) = c(-) + g{-) can be expressed as 

^xoW_^fni^ vp! V^y 



with 



'^°^ ' pez^nAfs 



( \ i^ I -*- II l|2 , 1 T 

i^xo(x) = exp(-^||xo||2 +^x Xo 



and with a coefficient sequence a[p\ £ £^(Z1^ n Xs). This coefficient sequence is unique for a given c(-). 
2) Let c(-) : Xg — )• M ^e a valid prescribed bias function. Then: 
a) The minimum achievable variance at xq G Xs is given by 

Mssnm(c(-),xo) = Y. «xo[p] -7^(xo), (45) 

with 

1 5P(7(ax)z.x„(cTx)) 



«Xo[p] = 



x=0 



Vpl 5xP 

b) The LMV estimator at xq is given by 

Oxoip] 5P'0xo(x,y 






(46) 

x=0 



with 

Note that the statement of Theorem IVIII.2I is stronger than that of Corollary IVIII. 1 [ because it contains 
explicit expressions of the minimum achievable variance Mssnm(c(-)) xq) and the corresponding LMV estimator 

^W-),X0)(y). 

The expression (|45] | nicely shows the influence of the sparsity constraints on the minimum achievable 
variance. Indeed, consider a prescribed bias c(-) : M^— )-M that is valid for the SSNM with S = N, and therefore 
also for the SSNM with S < N. Let us denote by M^ and Ms the minimum achievable variance M(c(-),xo) 
for the degenerate SSNM without sparsity (S = N) and for the SSNM with sparsity (S < N), respectively. 
Note that in the nonsparse case S = N, the SSNM coincides with the LGM with system matrix H = I. It then 
follows from (1451 ) that M^ = X]pez" "xiJp] ~ 7^(xo) and 

Mn-Ms = Yl "xo[p]- (47) 

pezy\A's 
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Clearly, if x is more sparse, i.e., if the sparsity degree S is smaller, the number of (nonnegative) terms in the 
above sum is larger. This implies a larger difference M]\j — Ms and, thus, a stronger reduction of the minimum 
achievable variance due to the sparsity information. 

We mention the obvious fact that a UMV estimator for <Sssnm = ('^S'l /i(y;x),5(-)) and prescribed bias 
function c(-) exists if and only if the LMV estimator g(^(')'^")(^-) given by (l46l ) does not depend on xq. 

Finally, consider the SSNM with parameter function ^(x) = Xk, i.e., <?ssnm = {'^s, /i(y; x) , g{-x.) = Xk), for 
some k G [N]. Because the specific estimator ^(y) = y^ has finite variance and zero bias at each x G Xs, the 
bias function Cu(x) = must be valid for <?ssnm at each xq G Xs- Therefore, according to Corollary IV.5J the 
minimum achievable variance for unbiased estimation within the SSNM with parameter function g{-x.) = Xk, 
Mssnm(cu(-))Xo), is a lower semi-continuous function of xq on its domain, i.e., on Xs- (Note that this remark 
is not related to Theorem IVIII.2I ) 

B. Diagonal Bias Functions 

In this subsection, we consider the SSNM-based estimation probleno "^ssnm = {^s, /i(y; x), (^(x) =Xk), 
for some k G [A^], and we study a specific class of bias functions. Let us call a bias function c(-) : Xs —5- M 
diagonal if c(x) depends only on the A;th entry of the parameter vector x, i.e., the specific scalar parameter 
Xk to be estimated. That is, c(x) = c{xk), with some function c(-) : M — )• M that may depend on k. Similarly, 
we say that an estimator £fc(y) is diagonal if it depends only on the fcth entry of y, i.e., Xk{y) = £fc(yfc) 
(with an abuse of notation). Clearly, the bias function 6(xfc(-);x) of a diagonal estimator Xk{-) is diagonal, 
i.e., 6(xfc(-); x) = b{xk{-);xk)- Well-known examples of diagonal estimators are the hard- and soft-thresholding 
estimators described in ||2], P?1 . and 111 Oil and the LS estimator, XLS,fc(y) = Uk- The maximum likelihood 
estimator for the SSNM is not diagonal, and its bias function is not diagonal either II131 . 

The following theorem [31, Thm. 5.5.4], which can be regarded as a specialization of Theorem I VIII. 2 1 to 
the case of diagonal bias functions, provides a characterization of the class of valid diagonal bias functions, 
as well as of the minimum achievable variance and LMV estimator for a prescribed diagonal bias function. In 
the theorem, we will use the Ith order (probabilists') Hermite polynomial Hi{-) : M — )• M defined as B6l 

Hi{x) ^ (-l)'e^^/2-^e-V2. 
ax' 

Furthermore, in the case ||xo||g = S, the support of xq will be denoted as supp(xo) = {ki, . . . , ks}. 

Theorem VIII.3. Consider the SSNM-based estimation problem <?ssnm = ('^S) /i(y;x),5(x) = Xfc), k G [N], 
at Xq G Xs- Furthermore consider a prescribed bias function c(-) : Xs — )• M that is diagonal and such that the 
prescribed mean function 7(x) = c(x) + Xk can be written as a convergent power series centered at xq, i.e., 

7(x) = Xl7f(^fc-^o,fc)', (48) 

'^We recall that the assumption g(x) — Xk is no restriction, because the MVP for any given parameter function g{-) is equivalent to 
the MVP for the parameter function g'(x) = Xk and the modified prescribed bias function c'(x) = c(x) + <?(x) — x^. 
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with suitable coefficients mi. (Note, in particular, that rriQ = 7(xo).j In what follows, let 



B, 



E^ 



mfa'^'- 



l\ 



1) The bias function c(-) is valid at xq if and only if Be < oo. 

2) Assume that Be < oo, i.e., c(-) is valid. Then: 

a) The minimum achievable variance at xq is given by 



with 



1, 



(xo) ^ 






1 — exp 



^0,fcj 



(Recall that supp(xo) = {ki}-^^ in the case |supp(xo) U {k}\ = S + 1.) 
b) The LMV estimator at xq is given by 



Jf|supp(xo)U{A;}| <S 
<1, ;/|supp(xo)U{A;}| =5 + 1. 

(49) 



^W-),Xo), 



X 



y) = V^(y,xo) ^-^F;( 1 

1&Z+ ' ^ ^ 



with 



V'(y,xo) = < 



jJ|supp(xo)U{/c}| <S 



J]; exp 

ieis] 



xIj,. + 2yk^xo,k, 



ie[i-i] 



2^2 



1 — exp 



^o,fc, + 2yfc^xo,fc^. 



2a2 



j/ I supp(xo) U {k}\ = S + 1 . 

(50) 
Regarding the case distinction in Theorem IVIII.3I we note that |supp(xo) U {k}\ < S either if ||x||q < S 

or if both ||x||q = S and k G supp(xo), and |supp(xo) U {k}\ = 5 + 1 if both ||x||q = S and k supp(xo). 
If the prescribed bias function c(-) is the actual bias function 6(x'^(-); x) of some diagonal estimator x'^,(y) = 

x'kiyk) with finite variance at xq, the coefficients mi appearing in Theorem IVIII.3 I have a particular interpretation. 

For a discussion of this interpretation, we need the following lemma BTl . 

Lemma VIII.4. Consider the SSNM-based estimation problem Sssnm = ('^s, /i(y; x), ^(x) = Xfc), k G [N], at 
Xq G Xs. Furthermore consider the Hilbert space "Pssnm consisting of all finite-variance estimator functions 

g{-) : M — )■ M, i.e., Pssnm — {5(")l^(5(")'Xo) <oo}, and endowed with the inner product 



\9i{-),92{-)] 



RV 



Exo{^i(y)52(y)} = (2TTa'^W/2 l9i{y)92{y)eyipi-^\\y-yio\\ljdy. 
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Then, the subset Pssnm ^ "Pssnm consisting of all diagonal estimators g{y) = g{yk) is a subspace of Vssnm, 
with induced inner product 

An orthonormal basis for 2?ssnm is constituted by {/i^'HOI^ez > ^^^^ /i*-'^(-) : K^— )■ M given by 

hi^Hy) = ^H,(y^^y (51) 

Combining Theorem IVIII.3I with Lemma rVlII.4l yields the following result ||3T1 Cor. 5.5.7]. 

Corollary VIII.5. Consider the SSNM-based estimation problem <?ssnm = ('^5, /i(y;x),5(x) = x,fc), k G [N], 
at xq E Xs- Furthermore consider a prescribed diagonal bias function c(-) : Xs — )• M that is the actual bias 
function of a diagonal estimator Xfc(y) = Xk{yk), i-C-, c(x) = 6(a;fc(-);x). The estimator Xk{-) is assumed to 
have finite variance at xq, w(j;fc(-);xo) < oo, and hence Xk{y) € X'ssnm cmd, also, c(-) is valid. 

1) The prescribed mean function 7(x) = c(x) + Xk = E^{xk{y)} can be written as a convergent power 
series (|48] ). with coefficients given by 

mi = ^(^.(•),^^')(-)>^_ (52) 

1 f - f \ TT fy-XQ,k\ ( 1 



/ Xkiy)Hii -^\ expi-^{y-xo^kfjdy. 



2) The minimum achievable variance at xq is given by 

Mssnm(c(-),xo) = f(xfe(-);xo)(/)(xo) + [</>(xo)-l]7^(xo), (53) 

with (J){xq) as defined in (1491 ). 

3) The LMV estimator at xq is given by 

4'^'^'""^(y) = Xfc(yfc)V'(y,xo), (54) 

with ip{y,xo) as defined in (l50l) . 

It follows from (l52l ) and from Lemma rVlII.4l that the given diagonal estimator Xfc(-) can be written as 

Thus, the coefficients mi appearing in Theorem I VIII. 3 1 have the interpretation of being (up to a factor of 
1 /%//!) the expansion coefficients of the estimator Xfc(-) — viewed as an element of T>ssnm — with respect to the 
orthonormal basis {h^ {y)},^^ ■ 

Remarkably, as shown by (l54l ). the LMV estimator can be obtained by multiplying the diagonal estimator 
Xfe(y) — which is arbitrary except for the condition that its variance at xq is finite — by the "correction factor" 
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ip{y, xq) in (l50l ). It can be easily verified that V'(y) xq) does not depend on y^. According to (|50] |. the following 
two cases have to be distinguished: 

1) For k € [N] such that | supp(xo) U {k}\ < S, we have 'ip{y, xq) = 1, and therefore the LMV estimator is 
obtained from (l54l ) as xj^ " (y) = Xk{yk) = Xkiy)- Thus, in that case, it follows from Corollary IVIII. 5 1 
that every diagonal estimator Xfc(-) : M^— )■ M for the SSNM that has finite variance at xq is necessarily 
an LMV estimator. In particular, the variance v{xk{-);^o) equals the minimum achievable variance 
-A^ssnm(c(-)iXo), i.e., the Barankin bound. Furthermore, the sparsity information cannot be leveraged 
for improved MVE, because the estimator Xfc(-) is an LMV estimator for the parameter set Xs with 
arbitrary S, including the nonsparse case X = R^. 

2) For k £ [N] such that |supp(xo) U {k}\ = S + 1, it follows from Corollary I VIII. 5 1 and (|49] | that there 
exist estimators (in particular, the LMV estimator x^f ° (y)) with the same bias function as Xfc(-) but 
with a smaller variance at xq. Indeed, in this case, we have </>(xo) < 1 in (|49l ). and by (|53] ) it thus follows 
that Mssnm(c(-),xo) < u(£fc(-);xo). 

Let us for the moment make the (weak) assumption that the given diagonal estimator Xk{-) has finite variance 
at every parameter vector x G M^. It can then be shown that the LMV estimator x^ (•) is robust to deviations 

from the nominal parameter xq in the sense that its bias and variance depend continuously on xq. Furthermore, 
x^"^ " (•) has finite bias and finite variance at any parameter vector x G M^, i.e., |6(x^!^ ° (•);x) | < oo 
and t>(4''^'^''"'^(-);x) < oo for all x G M^. 

We finally note that Corollary IVIII. 5 1 also applies to unbiased estimation, i.e., prescribed bias function 
c(-) = (equivalently, 7(x) = x^). This is because c(-) = is the actual bias function of the LS estimator 
XLS,kiy) = Vk- Clearly, the LS estimator is diagonal and has finite variance at xg. Thus, it can be used as the 
given diagonal estimator Xfc(y) in Corollary IVIII.5I 

C. Lower Variance Bounds 

Finally, we complement the exact expressions of the minimum achievable variance AfssNM(c(-)iXo) pre- 
sented above by simple lower bounds. The following bound is obtained by specializing the sparse CRB in 
Theorem Ivn] to the SSNM (H = I). 



Corollary VIII.6. Consider the estimation problem (Sssnm = ('^S) /i(y;x),(7(x) =Xk). Let xq G Xs- If the 

dcjx.) 
dxi 



prescribed bias function c(-) : Xs — )• M is such that the partial derivatives q^' I ^ exist for all I G [A^], then 



Mssnm(c(-),xo) > < 



cr^ llbll, , if llxolln < S 

II 112' J II UNO ^55^ 

o-^llbxolli' if\\^o\\o = S. 



Here, in the case ||xo||q < 5, b G M is given by bi = 6k,i + g^ \ _ , / G [A^], and in the case ||xo||g = S, 
bxoG M"^ consists of those entries ofh that are indexed by supp(xo) = {ki, . . . , ks}, i.e., (bxo)j = b^., i G [S]. 
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Specializing the alternative bound in Theorem IVI.2I to the SSNM yields the following result. 

Corollary VIII.7. Consider the estimation problem iSssnm = (^^5) /i(yjx))5(x) = x^). Let xq G Xs, and 
consider an arbitrary index set fC = {A;i, . . . , k\tc\} ^ [N] consisting of no more than S indices, i.e., |/C| < S. 
If the prescribed bias function c(-) : Xs ~^ "^ i^ such that the partial derivatives q^ | _ exist for all ki € /C, 

then 

Mssnm(c(-),xo) > exp(-^||xf ^'^liy [a2||b,J|2 + f (x^)] _ /(xq) . 

Here, h^ G RI'^I is defined elementwise as (bx„),' — Sk k- + a*! I _ /c for i G [|/C|], and 7(x) = c(x) + x^. 



Furthermore, the modified bound in (I35l l specialized to the SSNM reads as 

Mssnm(c(-),xo) > expf-^||(I-P)xo||^jcT2||bxJ|^ (56) 

Because H = I, we have P = HjciUjc)^ = Iic(i-jc)^ = X]/ga: ^i^f- Therefore, multiplying xq by I — P simply 
zeros all entries of xq whose indices belong to /C, i.e., (I — P)xo = Xq'^''''^^'''^^ , and thus (l56l ) becomes 

Mssnm(c(-),xo) > exp^-^llx^P^^")^'^!!^) a'\\h.„\\l (57) 

For unbiased estimation (c(-) = 0), the following lower bound on Mssnm(c(-) = 0,xo) is based on the 
Hammersley-Chapman-Robbins bound (HCRB) fTSl . |[29l . l48l . This bound has been previously derived in a 
slightly different form in llT3l . 

Theorem VIII.8. Consider the estimation problem <?ssnm = ('^Si /i(y;x),5(x) =Xk) with k G [N] and the 
prescribed bias function c(-) = 0. Let xq G Xs. Then, 



-M"ssnm(c(-),xo) > < 



a2, i/|supp(xo)U{A;}| <5 

N-S-1 ^^^^ 

o^ ^_^ exp(-^o/cr^) , if |supp(xo) U {/i;}| = 5"+ 1 , 



where ^o denotes the value of the S-largest (in magnitude) entry of xq. 

In |[3T1 Thm. 5.4.2], it is shown that the bound (l58l) for |supp(xo) U {A;}| < 5 is obtained from the 
generic bound (OTT i by using for the subspace U the limit ofU^^' = span|tio(-)) {u\ {')}i^\m'] ^^ i— ^0. Here, 

uq{-) = ^ssnm,xo(-,xo) and 

fi)/N A -RsSNM,Xo(-,Xo +tei) -i?sSNM,Xo(-,Xo), if / G SUpp(xo) 

u] {■) = S l(^[N], 

I -RssNM,xo(-,xo-Coejn +te/) -i?ssNM,xo(-,xo), if /G [iV] \ supp(xo) , 

where jo denotes the index of the 5-largest (in magnitude) entry of xq. Similarly, the bound (|58] | for | supp(xo)U 
{k}\ = 5 + 1 is obtained from (OTI ) by using for U the limit ofW*> = span|tio(-);^ (■)} ^^ i— 5-0, where 
nW(-) = /2ssNM,xo(-,xo + tefc) - i?ssNM,xo(- ,xo). (An expression of i?ssNM,xo(-, •) was given in ^.) In 
lfT3l . an equivalent bound on the MSE (equivalently, on the variance, because c(-) = 0) was formulated for a 
vector- valued estimator x(-); that bound can be obtained by summing (l58l) over all k G [N]. 



29 



X2 



X2 



Xi 



■4- 



Xi 





(c) 



(d) 



Fig. 1. Examples of ^, -balls of radius S=l, S,(l), in R^:[(a)lg = 0,[(b)]g = 0-25,[(c)lg = 0-75,p)]g=l. 



It can be shown that the HCRB-type bound (l58l) is tighter (higher) than the CRB (1551 ) specialized to c(-) = 0. 
For |supp(xo) U {k}\ = S + 1 (which is true if both ||x||q = S and k supp(xo)), the HCRB-type bound 
(|58] ) is a strictly upper semi-continuous function of xq, just as the CRB (l55l) . Hence, it again follows from 
Corollary IV.SI that the bound cannot be tight, i.e., in general, we have a strict inequality in (|58] ). However, for 
|supp(xo) U {k}\ < S (which is true either if ||x||q < 5 or if both ||x||q = S and k G supp(xo)), the bound 
(|58] ) is tight since it is achieved by the LS estimator XLS,fc(y) = Vk- 



IX. Exact versus Approximate Sparsity 

So far, the parameter set X has been the set Xs of S'-sparse vectors. In this section, we consider an 
approximate version of S'-sparsity, which is modeled by a modified parameter set X. Following HI, lITOl . and 
Bl . we define this modified parameter set to be the £g-ball of radius S, i.e., 

X = Bg{S) = {x'g M^I IIx'II^ < 5} , with < g < 1 . 

The parameter set Xs of "exactly" 5-sparse vectors is a special case obtained for q = 0, i.e., Xs = Bo{S). In 
Fig. [U we illustrate Bq{S) in M^ for 5 = 1 and various values of q. In contrast to Xs = Bq{S), the parameter 
sets Bq{S) with g > are bounded, i.e., for every q> Q and S G [A^], Bq{S) is contained in a finite ball about 
0. Thus, the set Xs of exactly 5-sparse vectors is not a subset of Bq{S) for any q>Q. 
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For a given system matrix H G M*^^^, sparsity degree S < N, and index k G [A^], let us consider the 

estimation problem 

f('?)^ (e,(5),/H(y;x),5(x)=xfc). 

Note that £^i^ differs from the SLGM-based estimation problem <Sslgm = ('^Si /H(y;x), (^(x) =2;^) only in 

the parameter set X, which is Bq{S) instead of Xs- Because ^30(5") = Xs, we have E^^' = <Sslgm- Furthermore, 

we consider a bias function c(-) : M^— )• M that is defined on all of M^, and a parameter vector xq € Bq{S)r\Xs- 

For <*^SLGM> as before, the bias function c(-) is prescribed on Xs, i.e., we consider estimators xi^{-) satisfying 

(cf. (i)) 

6(xfc(-); x) = c(x) , for all x G Afg . 

Again as before, the minimum achievable variance at xq is denoted as Mslgm(c(-)) xq). On the other hand, for 
£'^'i\ the bias function c(-) is prescribed on Bq{S), i.e., we consider estimators Xfc(-) satisfying 

6(xA;(-);x) = c(x) , for all xGBg(5). 

Here, the minimum achievable variance at xq is denoted as M('?)(c(-),xo). 

Evidently, because ^0(5*) = Xs and £'^^'> = iSslgm, we have M(°)(c(-),xo) = Mslgm(c(-)iXo). It seems 
tempting to conjecture that M('')(c(-),xo) ~ Mslgm(c(-))Xo) for g ~ 0, i.e., changing the parameter set X 
from Xs = Bq{S) to Bq{S) with q > 0, and hence considering S^"^' instead of <*^slgm> should not result in a 
significantly different minimum achievable variance as long as q is sufficiently small. However, the next result 
|[3n Thm. 5.6.1] implies that there is a decisive difference, no matter how small q is. 

Theorem IX.l. Consider a subset X C M^ that contains an open set, and a function c(-) : M^ — )• M that 
is valid at some xq G A' for the LGM-based estimation problem <Slgm = (l^ )/H(y;x),5(x) = Xk), with 
some system matrix H that does not necessarily satisfy condition @. Let Mlgm(c(-))Xo) denote the minimum 
achievable variance at xq for <Slgm with bias function c(-) prescribed on M^. Furthermore let M'(c(-),xo) 
denote the minimum achievable variance at xq for the estimation problem E' = (yX, fniy, x), ^(x) = j;^) with 
bias function c(-) prescribed on X. Then 

M'(c(-),xo) = Mlgm(c(-),xo). 

Moreover, the LMV estimator^ ^lgm'''' (") f^^ "^lgm <^nd bias function c(-) is simultaneously the LMV estimator 
for £' and bias function c(-)|^. 

Since for q> 0, the parameter set X = Bq{S) contains an open set. Theorem llX. 1 1 implies that 

M^'^^CcCO.xo) = Mlgm(c(-),xo), forallg>0. 

Thus, the minimum achievable variance for 8^'^\ g > with bias function c(-) prescribed on Bq{S) is always 
equal to the minimum achievable variance for <Slgm with bias function c(-) prescribed on M^. Furthermore, 

'^This estimator is given by Part 3 of Theorem IV.4I specialized to S — N (in which case the SLGM reduces to the LGM). 
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Theorem IIX. 1 1 also implies that the minimum achievable variance for £^'^> = (^Bq{S), f-ii{y;x),g{x) = Xk), 
g > is achieved by the LMV estimator for <?lgm = (l^^; /H(y;x),5(x) = x^). But since in general 
-Mlgm(c(-))Xo) > Mslgm(c(-)7Xo) (see (|47] ) for the special case given by the SSNM), it follows that 
M('')(c(-),xo) = Mlgm(c(-))Xo) does not generally converge to Mslgm(c(-)iXo) as q approaches 0. 

For another interesting consequence ofTheorem lIX.il consider an estimation problem £ = (^X, fniy, x), (7(x) 
Xk) whose parameter set X is the union of the set of exactly 5-sparse vectors Xs and an open ball S(xc, r) = 
{x E M^ I ||x - Xc||2 < r}), i.e., X = Xs U B{xc,r). Then, it follows from Theorem 11X11 that the 
minimum achievable variance for £ at any sparse xq € Xs coincides with Mlgm(c(-)iXo). Since in general 
-^lgm(c(-)) xq) > Mslgm(c(-)) Xq) this implies that the minimum achievable variance for £ is in general strictly 
larger than the minimum achievable variance for the SLGM. Thus, no matter how small the radius r is and 
how distant Xc is from Xs, the inclusion of the open ball in X significantly affects the MVE of the S'-sparse 
vectors in Xs- 

The statement of Theorem IIX. 1 1 is closely related to the facts that (i) the statistical model of the LGM 
belongs to an exponential family, and (ii) the mean function 7(x) = Ex{g{y)} of any estimator g{-) with finite 
bias and variance for an estimation problem whose statistical model belongs to an exponential family is an 
analytic function ll34l Lemma 2.8]. Indeed, any analytic function is completely determined by its values on an 
arbitrary open set in its domain [19|. Therefore, because the mean function 7(x) of any estimator for the LGM 
is analytic, it is completely specified by its values for all x G Bq{S) with an arbitrary g > (note that Bq{S) 
contains an open set). 

X. Numerical Results 

In this section, we compare the lower variance bounds presented in Section |Vl] with the actual vari- 
ance behavior of some well-known estimators. We consider the SLGM-based estimation problem <Sslgm = 
('^S) /H(y;x),g(x) =Xfc) for k G [N]. In what follows, we will denote the lower bounds (l32l ). (l34l ). and (l35l) 
by Bf, ^(c(-),xo), B\^ ^(c(-),xo), and B^. ^(c(-),xo), respectively. We recall that the latter two bounds depend 
on an index set /C C [N] with |/C| < 5, which can be chosen freely. 

Let x(-) be an estimator of x with bias function c(-). Because of ^, a lower bound on the estimator 
variance f(x(-);xo) can be obtained by summing with respect to A; G [N] the "scalar bounds" Bj^ (cjfc(-),xo) 
or Sf ^(cfc(-),xo) or ^f ^(cfc(-),xo), where Cfc(-) = (c(-))^, i.e., 

r;(x(.);xo) > B^^l^l^\c{-),x^) ^ J] B^^'^^^\ck{-),xo) . (59) 

k(i[N] 

Here, the index sets K,k used in Bj^ (cjt(-),xo) and Bj^ (cjt(-),xo) can be chosen differently for different k. 

A. An SLGM View of Fourier Analysis 

Our first example is inspired by WT\ Example 4.2]. We consider the SLGM with N even, i.e., N = 2L, 
and cr^ = 1. The system matrix H G ]^Afx2L j^ given by Hm.i = cos(6';(r7T, — 1)) for m G [M] and / G [L] and 
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Oracl^ CRB 

10 20 

SNR [dB] 

Fig. 2. Variance of the OMP estimator and corresponding lower bounds versus SNR, for the SLGM with N — 16, M = 128, 5 = 4, 
and a^ — 1. 



Hm,i = sin(6';(?7i — 1)) for m G [M] and Z G {L + 1, . . . , 2L}. Here, the normalized angular frequencies 6i 
are uniformly spaced according to 6i = 6q + [(^ — 1) mod L\ A9, I G [N]. The multiplication of x by H then 
corresponds to an inverse discrete Fourier transform that maps 2L spectral samples (the entries of x) to M 
temporal samples (the entries of Hx). In our simulation, we chose M = 128, L = 8 (hence, N = 16), 5 = 4, 
6o = 0.2, and A^ = 3.9 • 10^'^. The frequency spacing A^ is about half the nominal DFT frequency resolution, 
which is 1/128 sa 7.8 x 10"^. 

We consider the OMP estimator xomp(-) that is obtained by applying the OMP EU, EQI with 5 = 4 
iterations to the observation y. We used Monte Carlo simulation with randomly generated noise n~A/'(0,I) to 
estimate the variance u(xomp(');^o) of xomp(')- The parameter vector was chosen as xq = VSNRxq, where 
xq G {0, 1}^^, supp(xo) = {3,6, 11, 14}, and SNR varies between 10"^ and 10^. Thus, the observation y is a 
noisy superposition of four sinusoidal components with identical amplitudes; two of them are consine and sine 
components with frequency ^3 = On = Oq + 2/S.9, and two are cosine and sine components with frequency 
6*6 = ^14 = ^0 + 5A^. In Fig. [2l we plot w (xomp(')!^o) versus SNR. For comparison, we also plot the lower 
bounds 5(i)(coMp(-)>xo), -B(2)(comp(-),xo), and B^^\coMv{-),y^o) in <l59ll, with comp(x) = b(xoMp(-);x) 
being the actual bias function of the OMP estimator xomp(')- To evaluate these bounds, we computed the 
first-order partial derivatives of the bias functions cqmp.A; (x) (see Theorems IVI.ll and IVI.2I ) by means of (l40l ) 
and Monte Carlo simulation (see |[28l for details). The index sets /C^ in the bounds i3(^)(coMp(')>xo) and 
B^^\coMp{-),^o) were chosen as Kk = supp(xo) for k G supp(xo) and Kk = {k} for k ^ supp(xo). This is 
the simplest nontrivial choice of the Kk for which i?(^)(coMp(')>xo) is tighter than the state-of-the-art bound 
-B^^^(coMp('))Xo) (the sparse CRB, which was originally presented in lITTI ). Finally, Fig. |2] also shows the 
"oracle CRB," which is defined as the CRB for known supp(xo). This is simply the CRB for a linear Gaussian 
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model with system matrix Hsupp(xo) and is thus given by tr((H^ ,^ )Hsupp{xo)) ) ^ 4.19 IITtI for all values 
of SNR (recall that we set a'^ = 1). 

As can be seen from Fig. |2] for SNR below 20 dB, u(xomp(') J ^o) is significantly higher than the four lower 
bounds. This suggests that there might exist estimators with the same bias as that of the OMP estimator but a 
smaller variance; however, a positive statement regarding the existence of such estimators cannot be based on 
our analysis. For SNR larger than about 15dB, the four lower bounds coincide. Furthermore, for SNR larger 
than about lldB, 7;(xomp(');^o) quickly converges toward the lower bounds. This is because for high SNR, 
the OMP estimator is able to detect supp(xo) with very high probability. Note also that the results in Fig. |2] 
agree with our observation in Section fVI-B I around (|36] |. that the bound B^^>{c{-),xq) tends to be higher than 

i?(2)(c(.),Xo). 

B. Minimum Variance Analysis for the SSNM 

Next, we consider the maximum Ukelihood (ML) estimator and the hard-thresholding (HT) estimator for 
the SSNM, i.e., for M = iV and H = I, with N = 50, S = 5, and a^ = 1. The ML estimator is given by 

XML(y) = argmax/(y;x') = Ps(y) , 

where the operator Pg retains the S largest (in magnitude) entries and zeros all other entries. Closed-form 
expressions of the mean and variance of the ML estimator were derived in |[T3l . The HT estimator xht(-) is 
given by 

Vk, \yk\ > T 



^HT,fc(y) = XuT^kiUk 



kG[N], (60) 

, else , 



where T is a fixed threshold. Note that in the hmiting case T = 0, the HT estimator coincides with the LS 
estimator XLs(y) = y l!I3, ifTSl . ll27l . The mean and variance of the HT estimator are given by 



dy (61) 



Ex{xHT,fe(y)} = -1== / y exp(--^(y-2;fc)M 

v{xHT,k{-);x) = ^ / 2/2 exp(-— ^(2/-Xfc)M dy - (Ex{xHT,fc(y)})^ (62) 

V27rcr^ Jr\[-T,T] \ za j 

We calculated the variances w(xml(');xo) and ti(xHT(-);xo) at parameter vectors xq = \/SNRxo, where 

xo € {0,1}^°, supp(xo) = \S\, and SNR varies between 10"^ and 10^. (The fixed choice supp(xo) = \S\ is 

justified by the fact that neither the variances of the ML and HT estimators nor the corresponding variance 

bounds depend on the location of supp(xo).) In particular, f (xht(-); ^o) was calculated by numerical evaluation 

of the integrals (|62] | and (|6TI ). Fig. |3]shows i;(xml('); ^o) and u(xht(-); ^o) — the latter for four different choices 

of T in (l60l ) — versus SNR. Also shown are the lower bounds i?(2)(cML(')>xo) and i?('^)(cML(')>xo) as well 

as i?(2)(cHT(-))Xo) and i?(^)(cHT(-)>xo) (cf. (l59l)). with Cml(-) and Cht(-) being the actual bias functions of 

xml(') and of xht(-)' respectively. The index sets underlying the bounds were chosen as /C^ = supp(xo) for 
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Fig. 3. Variance of tiie ML and HT estimators and coiTesponding lower bounds versus SNR, for the SSNM with A'^ = 50, 5 = 5, and 

,2 _ -I 



k E supp(xo) and /C^ = {k} U {supp(xo) \{js'}} for k ^ supp(xo), where js denotes the index of the S- 
largest (in magnitude) entry of xq. For this choice of the )Ck, the two bounds are equal, i.e., i?(^)(cML(')i ^o) = 
-B''^^(cml(-))^o) Sind B^'^\cht{-),xo) = i3'^'^)(cHT(-))^o)- The first-order partial derivatives of the bias functions 
CML,fc(x) involved in the bounds B^'^/^\cml{-) ,^o) were approximated by a finite-difference quotient Il28l . i.e., 



9cML,fc(x) 

dxi 



SkJ + 



OE^{:rML,fc(y)} 



dx, 



with 



9Ex{xML,fe(y)} Ex+Ae,{2:ML,fc(y)} " Ex {xML,fc(y) } 



dxi A 

where A > is a small stepsize and the expectations were calculated using the closed-form expressions 
presented in ||T3] Appendix I]. The first-order partial derivatives of the bias functions CHT,fc(x) involved in the 
bounds i?(^/^)(cHT(-))^o) were calculated by means of (l40l ). 

It can be seen in Fig. [3] that for SNR larger than about 18 dB, the variances of the ML and HT estimators 
and the corresponding bounds are effectively equal (for the HT estimator, this is true if T is not too small). 
Also, all bounds are close to Sa"^ = 4; this equals the variance of an oracle estimator that knows supp(xo) and 
is given by a;fc(y) = yk for k G supp(xo) and Xfe(y) = otherwise. However, in the medium-SNR range, the 
variances of the ML and HT estimators are significantly higher than the corresponding lower bounds. We can 
conclude that there might exist estimators with the same bias as that of the ML or HT estimator but a smaller 
variance; however, in general, a positive statement regarding the existence of such estimators cannot be based 
on our analysis. 

On the other hand, for the special case of diagonal estimators, such as the HT estimator. Theorem IVIII.3 1 and 
Corollary IVIILSI make positive statements about the existence of estimators that have locally a smaller variance 
than the HT estimator. In particular, we can use Corollary IVIII.SI to obtain the LMV estimator and corresponding 
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Fig. 4. Variance of tlie HT estimator, u(xHT(-)i xo), for different T (solid lines) and corresponding minimum achievable variance 
(Barankin bound) Mht(xo) (dashed lines) versus SNR, for the SSNM with iV = 50, 5 = 5, and a^ = l. 

minimum achievable variance at a parameter vector xq G Xs for tlie given bias function of the HT estimator, 
Cht(')- 111 Fig- in we plot the variance f(xHT(-);xo) for four different choices of T versus SNR. We also plot 
the corresponding minimum achievable variance (Barankin bound) Mht(xo) = X^fcefM ^^ssnm(cht,A;(')>^o)- 
Here, MssNM(cHT,fc('))^o) was obtained from (1531 ) in Corollarv lVIII.51 (Note that (1531 ) is appUcable because the 
estimator XHT,A:(y) is diagonal and has finite variance at all xq G Xs-) It is seen that for small T (including T = 0, 
where the HT estimator reduces to the LS estimator) and for SNR above OdB, t7(xHT(-);xo) is significantly 
higher than Mht(xo). However, as T increases, the gap between the t;(xHT(-); xq) and Mht(xo) curves becomes 
smaller; in particular, the two curves are almost indistinguishable already for T= 4. For high SNR, Mht(xo) 
approaches the oracle variance Sa'^ = 4 for any value of T. 



XI. Conclusion 

We used RKHS theory to analyze the MVE problem within the sparse linear Gaussian model (SLGM). 
In the SLGM, the unknown parameter vector to be estimated is assumed to be sparse with a known sparsity 
degree, and the observed vector is a linearly transformed version of the parameter vector that is corrupted by i.i.d. 
Gaussian noise with a known variance. The RKHS framework allowed us to establish a geometric interpretation 
of existing lower bounds on the estimator variance and to derive novel lower bounds on the estimator variance, 
in both cases under a bias constraint. These bounds were obtained by an orthogonal projection of the prescribed 
mean function onto a subspace of the RKHS associated with the SLGM. Viewed as functions of the SNR, the 
bounds were observed to vary between two extreme regimes. On the one hand, there is a low-SNR regime 
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where the entries of the true parameter vector are small compared with the noise variance. Here, our bounds 
predict that if the estimator bias is approximately zero, the a priori sparsity information does not help much 
in the estimation; however, if the bias is allowed to be nonzero, the estimator variance can be reduced by the 
sparsity information. On the other hand, there is a high-SNR regime where the nonzero entries of the true 
parameter vector are large compared with the noise variance. Here, our bounds coincide with the Cramer-Rao 
bound of an associated conventional linear Gaussian model in which the support of the unknown parameter 
vector is supposed known. Our bounds exhibit a steep transition between these two regimes. In general, this 
transition has an exponential decay. 

For the special case of the SLGM that corresponds to the recovery problem in a linear compressed sensing 
scheme, we expressed our lower bounds in terms of the restricted isometry and coherence parameters of the 
measurement matrix. Furthermore, for the special case of the SLGM given by the sparse signal in noise model 
(SSNM), we derived closed-form expressions of the minimum achievable variance and the corresponding LMV 
estimator. These latter results include closed-form expressions of the (unbiased) Barankin bound and of the 
LMVU estimator for the SSNM. Simplified expressions of the minimum achievable variance and the LMV 
estimator were presented for the subclass of "diagonal" bias functions. 

An analysis of the effects of exact and approximate sparsity information from the MVE perspective showed 
that the minimum achievable variance under an exact sparsity constraint is not a limiting case of the minimum 
achievable variance under an approximate sparsity constraint. 

Finally, a comparison of our bounds with the actual variance of established estimators for the SLGM and 
SSNM (maximum hkelihood estimator, hard thresholding estimator, least squares estimator, and orthogonal 
matching pursuit) showed that there might exist estimators with the same bias but a smaller variance. 

An interesting direction for future investigations is the search for (classes of) estimators that asymptotically 
approach our lower variance bounds when the estimation is based on an increasing number of i.i.d. observation 
vectors y^. In the unbiased case, the maximum likelihood estimator can be intuitively expected to achieve the 
variance bounds asymptotically. However, a rigorous proof of this conjecture seems to be nontrivial. Indeed, 
most studies of the asymptotic behavior of maximum likelihood estimators assume that the parameter set is 
an open subset of M^ ITSll . H9l . |[50ll . which is not the case for the parameter set Xs- For the popular class 
of M-estimators or penalized maximum likelihood estimators, a characterization of the asymptotic behavior is 
available 1301 . 1501 . lISTI . Under mild conditions, M-estimators allow an efficient implementation via convex 
optimization techniques. 

Furthermore, it would be interesting to generalize our results to the case of block or group sparsity Il52l - ll54l . 
This could be useful, e.g., for sparse channel estimation in the case of clustered scatterers and delay-Doppler 
leakage f55\ and for the estimation of structured sparse spectra (extending sparsity-exploiting spectral estimation 
as proposed in Il56l - ll59l ). 
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